Why MapR Data Science Refinery?
Enable More Accurate Insights with Access to All Data
The MapR Data Science Refinery is the only data science offering with secured access to all data. It connects out of the box with:
MapR-XD: for files and containers
• Globally distributed data store
• High-scale and reliable
MapR-DB: a highly scalable, multi-model, NoSQL database management system
• Supports multiple data models, including wide-column, document, key value, and time-series
MapR-ES: global publish-subscribe event streaming system
• The first big data-scale streaming system built into a converged data platform
• The only big data streaming system to support global event replication reliably at IoT scale
Create Real-Time Machine Learning Pipelines
A core component of the MapR Platform, MapR-ES is a global publish-subscribe event streaming system for big data. With native integration between MapR-ES and machine learning libraries, organizations can now create real-time machine learning pipelines, allowing them to apply ML models to real-time data.
Increase Data Science Productivity with Broad Language and Library Support
The MapR Data Science Refinery offers the Apache Zeppelin Data Science Notebook to provide the ability to work across many engines in one visual space:
• Distributed Compute and ML programming with Apache Spark & Python
• Batch and Interactive SQL with Apache Hive and Drill
• Scripting support for Apache Pig
• Shell access to MapR-FS
• Programmatic access to MapR-DB and MapR-ES, using Spark
Easy Deployment with Persistent and Stateful Containers
Easy To Deploy
• A Docker image is available on Docker Hub.
• Image includes all the necessary bits—no more, no less—required to leverage MapR as a persistent data store for your containerized applications.
Secure
• Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized.
• Communications are encrypted to ensure privacy when accessing data in MapR.
Extensible
• A Dockerfile will also be available on GitHub, allowing you to further customize the image as needed to support your specific application needs.
Persistent
• Container can easily leverage all the MapR Platform services (MapR-FS, MapR-DB, MapR Streams) as a persistent data store.
Provide Robust Visualization Support to Data Scientists
The MapR Data Science Refinery comes with 8 out-of-the-box visualization libraries, including MatPlotLib and GGPlot2. Apache Zeppelin provides a pluggable visualization framework to enable:
• Common visualization libraries available in the NPM Registry
• The ability to easily create and load custom visualizations
Enable Notebook/Model Collaboration, Sharing, and Mirroring
The MapR Converged Data Platform is ideal for storing model and notebook repositories. Organizations can leverage the MapR Platform’s global namespace and superior replication capability. The MapR Platform also offers immutable snapshots to persist and deploy various versions of the same model, making it possible for data scientists to compare the performance and accuracy of each version of the model.
© Copyright 2000-2023 COGITO SOFTWARE CO.,LTD. All rights reserved