Microsoft Machine Learning for Apache Spark (MMLSpark) is an open source toolset aimed at expanding the distributed computing framework of Apache Spark, comprising of deep learning and data science tools, including seamless integration with Microsoft Cognitive Toolkit.
These tools enable powerful and scalable predictive and analytical models for a variety of datasources, and also brings new networking capabilities to the Spark Ecosystem.
The MMLSpark project has undergone a major facelift to better integrate with many deep learning and data science tools, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV.
And the integration with Microsoft Cognitive Toolkit (CNTK) and LightGBM, and other third-party projects such as OpenCV may perhaps turn Spark into a service, allowing Spark computations, such as machine learning predictions to be served via the web, and the interactions with third-party services via HTTP.
For instance, LIME on Spark can provide annotated results for the predictions served by an image classifier, an immediate way to determine whether the classifier is working correctly.
In the same vein, MMLSpark will provide easy to use SparkML transformers for a wide variety of Microsoft Cognitive Services and for production grade deployment; the Spark Serving project will enable high throughput, sub-millisecond latency web services, backed by Spark cluster.