Your location：Home>News Center >Industry News

Why MapR Data Science Refinery？

latest update：2018/05/15 Views：1842

Why MapR Data Science Refinery? Enable More Accurate Insights with Access to All Data The MapR Data Science Refinery is ...

Why MapR Data Science Refinery?

Enable More Accurate Insights with Access to All Data

The MapR Data Science Refinery is the only data science offering with secured access to all data. It connects out of the box with:

MapR-XD: for files and containers

• Globally distributed data store

• High-scale and reliable

MapR-DB: a highly scalable, multi-model, NoSQL database management system

• Supports multiple data models, including wide-column, document, key value, and time-series

MapR-ES: global publish-subscribe event streaming system

• The first big data-scale streaming system built into a converged data platform

• The only big data streaming system to support global event replication reliably at IoT scale

Create Real-Time Machine Learning Pipelines

A core component of the MapR Platform, MapR-ES is a global publish-subscribe event streaming system for big data. With native integration between MapR-ES and machine learning libraries, organizations can now create real-time machine learning pipelines, allowing them to apply ML models to real-time data.

Increase Data Science Productivity with Broad Language and Library Support

The MapR Data Science Refinery offers the Apache Zeppelin Data Science Notebook to provide the ability to work across many engines in one visual space:

• Distributed Compute and ML programming with Apache Spark & Python

• Batch and Interactive SQL with Apache Hive and Drill

• Scripting support for Apache Pig

• Shell access to MapR-FS

• Programmatic access to MapR-DB and MapR-ES, using Spark

Easy Deployment with Persistent and Stateful Containers

Easy To Deploy

• A Docker image is available on Docker Hub.

• Image includes all the necessary bits—no more, no less—required to leverage MapR as a persistent data store for your containerized applications.

Secure

• Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized.

• Communications are encrypted to ensure privacy when accessing data in MapR.

Extensible

• A Dockerfile will also be available on GitHub, allowing you to further customize the image as needed to support your specific application needs.

Persistent

• Container can easily leverage all the MapR Platform services (MapR-FS, MapR-DB, MapR Streams) as a persistent data store.

Provide Robust Visualization Support to Data Scientists

The MapR Data Science Refinery comes with 8 out-of-the-box visualization libraries, including MatPlotLib and GGPlot2. Apache Zeppelin provides a pluggable visualization framework to enable:

• Common visualization libraries available in the NPM Registry

• The ability to easily create and load custom visualizations

Enable Notebook/Model Collaboration, Sharing, and Mirroring

The MapR Converged Data Platform is ideal for storing model and notebook repositories. Organizations can leverage the MapR Platform’s global namespace and superior replication capability. The MapR Platform also offers immutable snapshots to persist and deploy various versions of the same model, making it possible for data scientists to compare the performance and accuracy of each version of the model.

Next：How Your Business Benefits from the MapR Data Science Refinery？

Prev：The MapR Data Science Refinery：Scalable Data Science Toolkit