How Apache PredictionIO makes it easier to serve Predictions from Machine Learning



Apache PredictionIO is an open-sourced version of a project originally developed by a subsidiary of Salesforce, which is built on Spark and Hadoop framework, and serves predictions from data using customizable templates for common tasks.

The PredictionIO’s event server receive data from apps to train a model, then query the engine for predictions based on the given model.

While PredictionIO's template system is perhaps its most notable advantage, as it reduces the heavy lifting needed to set up the system to serve specific kinds of predictions.

And some templates also integrate other machine learning products, but PredictionIO can automatically evaluate a prediction engine to determine the best hyperparameters to use.

Albeit, the developer needs to set metrics on how to do this, but there’s generally less work involved than in tuning hyperparameters manually.

Apache also offers supported SDKs for working in Java, Python, PHP and Ruby. MLlib, HBase, Spray, and Elasticsearch all come bundled with PredictionIO, while data can be stored in a variety of back ends, including: Elasticsearch, JDBC, HBase, HDFS and their local file systems are all supported out of the box.

PredictionIO can accept predictions singly or as a batch, with batched predictions automatically parallelized across a Spark cluster, as the algorithms (default algorithms) used in batch predictions are all serialized.
Previous
Next Post »