010-68421378
sales@cogitosoft.com
Your location:Home>News Center >Industry News

Open Source Engines and Tools:Integrated Data Platform, MapR Monitor and Manage the System

latest update:2018/05/15 Views:735
Open Source Engines MapR packages a broad set of Apache open source ecosystem projects that enable big data applications...

Open Source Engines
MapR packages a broad set of Apache open source ecosystem projects that enable big data applications. The goal is to provide you with an open platform that lets you choose the right tool for the job. MapR tests and integrates open source ecosystem projects such as Hive™, Pig™, Apache™ HBase™ and Mahout, among others. The MapR Converged Data Platform and the open source projects are tied together through an advanced management console to monitor and manage the system.
The MapR Ecosystem Pack (MEP) delivers quick customer access to the latest innovations from the open source community, while ensuring interoperability of all ecosystem projects in a given MEP release. MapR pioneered the decoupling of platform versions from project versions, and MEP is the next evolution of that process. This decoupling gives customers flexibility on when to upgrade their environment, and MEP will ensure customers have a fully compatible deployment.
MapR also makes available Developer Previews for new features and technologies that are still under development.
Core Hadoop


Apache Hadoop was born out of a need to process an avalanche of big data. The web was generating more and more information on a daily basis, and it was becoming very difficult to index over one billion pages of content. Hadoop has moved far beyond its beginnings in web indexing and is now used in many industries for a large variety of tasks that all share the common theme of lots of variety, volume, and velocity of data—both structured and unstructured.
Batch


 


Apache MapReduce is a powerful framework for processing large, distributed sets of structured or unstructured data on a Hadoop cluster. The key feature of MapReduce is its ability to perform processing across an entire cluster of nodes, with each node processing its local data. This feature makes MapReduce orders of magnitude faster than legacy methods of processing big data, which often consisted of a single node accessing and processing data located in remote SAN or NAS devices.
Interactive SQL
 


Apache Drill is a distributed system for interactive analysis of large-scale datasets. Drill is similar to Google’s Dremel, with the additional flexibility needed to support a broader range of query languages, data sources and data formats, including nested, self-describing data.
NoSQL


 


Apache HBase is a database that runs on a Hadoop cluster. Clients can access HBase data through either a native Java API, or through a Thrift or REST gateway, making it accessible by any language.
Graph

GraphX is a graph library that runs on top of Apache Spark. Developers can use the languages and tools they are familiar with using for Spark to implement new types of algorithms that require the modeling of relationships between objects.
Machine Learning

Apache Mahout is a powerful, scalable, machine-learning library that runs on top of Hadoop MapReduce. Machine learning is a discipline of artificial intelligence that enables systems to learn based on data alone, continuously improving performance as more data is processed. Machine learning is the basis for many technologies that are part of our everyday lives.
Streaming

Spark Streaming: When Hadoop first emerged, it provided a platform to store petabytes of data, and perform batch queries on that data to gather insights. This model works well for many use cases, like analyzing vast amounts of customer data for interesting patterns. However, not all data can wait for a batch query to be performed.
Data Tools

HttpFS is one of several tools available to interact with the MapR distributed file system. Some differentiating features of HttpFS include programmatic access, version independence, and remote access.
Coordination

Apache Oozie is a valuable tool for Hadoop users to automate commonly performed tasks in order to save time and prevent user error. With Oozie, users can describe workflows to be performed on a Hadoop cluster, schedule those workflows to execute under a specified condition, and even combine multiple workflows and schedules together into a package to manage their full lifecycle.
GUI, Configuration, Monitoring

Hue (Hadoop User Experience) offers a web GUI to Hadoop users to simplify the process of creating, maintaining, and running many types of Hadoop jobs. Hue is made up of several applications that interact with Hadoop components, and has an open SDK to allow new applications to be created.
Administrator
When applications go from idea to reality, MapR provides the only production-ready platform for Hadoop, Spark and related technologies.
Enterprise Architect
The design of the patented MapR Converged Data Platform speaks directly to Enterprise Architects who know best that architecture matters.
Developer
MapR provides developers the widest variety of popular open source projects for developing data applications.

Next:The MapR Data Science Refinery:Scalable Data Science Toolkit
Prev:The MapR Converged Data Platform: Industry’s Leading Unified Data Platform

© Copyright 2000-2023  COGITO SOFTWARE CO.,LTD. All rights reserved