Home AML Enter Apache Ambari – the Elephant Rider…

Enter Apache Ambari – the Elephant Rider…

by vamsi_cz5cgo

As an aficionado of Hadoop technology & a devoted student of it’s gradual progression from a Big data processing framework to an enterprise application & data ecosystem (more on that in follow-up posts), one of the big problems or rather gaps with Generation 1 was the lack of an open source enterprise grade operations team console.

This gap essentially meant that operations and admin teams had to be reasonably well educated in Hadoop semantics & internals to be able to do common use cases like – Provisioning – i.e cluster creation & replication, management & monitoring at scale, operational scripting and resource allocation monitoring.

This challenge is also exacerbated by the fact that Big Data clusters do not live in isolation in your datacenter. You need to integrate them with existing assets like data warehouses, RDBMS’s as well as extend them with whatever configuration & automation technology is currently in place. e.g. Puppet.

Another significant issue is that a leading Hadoop distro like Hortonworks has 22+ technology frameworks in addition to the two core Hadoop – HDFS & YARN. These include services like Hive, Pig, Spark, Storm etc.

You use the right framework for the right job depending on the business problem you are looking to solve. For instance, Hive for interactive SQL etc, Storm for stream processing, Mapreduce for batch etc. This means that not all slaves run similar clusters as they run different workloads.

The challenge then from a operational perspective is to install specific services on nodes or clusters within the overall infrastructure that needs to provide batch, interactive or streaming support to applications that need to access data within them.

Enter Apache Ambari thanks to work in the Apache Software Foundation, led by Hortonworks and many other community members…

What Ambari Does

Ambari enables system administrators to perform all the above usecases from a provisioning, management, monitoring & integration standpoint.

The key usecases where Ambari shines at –

1.Provision a Hadoop Cluster 

No matter the size of your Hadoop cluster, the deployment and maintenance of hosts is simplified using Ambari.

Provisioning of clusters, services and components can be accomplished in one of two different ways. One can leverage a well designed UI or use the Blueprints API capability to automate cluster installs without any manual intervention. Ambari Blueprints are a great feature where the operator can provide a declarative definition of a Hadoop cluster. One can specify a Stack, the Component layout within the stack and the necessary custom Configuration(s) so that a simple REST API call to the Ambari server results in an on the fly creation of the Hadoop cluster.

Blueprints also enable users to create their own custom Hadoop configurations depending on the business capability that they need to achieve. Thus clusters are built using the layout, components and stack defined in the Blueprints.

The architecture of an Ambari install is described below but the web UI is a client side JS application which access the server via a REST API. Thus the server can act as a full-fledged REST provider which enables easy integration with other management tooling. There are a rich set of permissions available based on what the users role is. The Ambari server can optionally plug into your LDAP/RDBMS authentication repository so that one does not need to duplicate permissions in the Ambari database.

2.Manage a Hadoop cluster

Ambari provides tools to simplify cluster management. The Web interface & the REST API allow a sys admin or an enterprise systems mgmt console to control the lifecycle of Hadoop services and components, modify configurations and manage the ongoing growth of your cluster.

Ambari leverages two other open source projects Nagios and Ganglia ], which are preconfigured and installed along with the Ambari server & agent.

AmbariMetrics

3.Monitor a Hadoop cluster

Ambari can monitor the health and status of a Hadoop cluster right down to a granular detail for metric display on the web UI. Ganglia integration provides not just tons of useful metrics that are all Hadoop based (e.g. HDFS Disk Usage, Links, DataNode info etc) but also provides nice histograms for one to understand patterns of usage from a threshold and trend perspective.

cluster_wide_metrics_2

4.Integrate Hadoop with Enterprise Systems Mgmt Disciplines

No Hadoop cluster lives in isolation from a management perspective in an enterprise datacenter. While Ambari shines at Hadoop management & monitoring, it is also fairly easy to hook it up with other enterprise mgmt consoles as well as with any systems mgmt products like Puppet, Chef and Ansible etc.

Ambari accomplishes this via three primary ways –

a) the REST API that enables integration with existing tools. Certifications have been established for established operations team tooling. Some major examples include Microsoft System Center Operations Manager, HP Operations Manager and Teradata Viewpoint.

b) the Stacks feature which enables one to define a group of services that can be installed as a unit along with locations of the repos etc where these could be found in your infrastructure. e.g. HDFS, Zookeeper, Hive, Pig, Spark etc. Stacks can also be versioned and support a definition hierarchy in that base stacks can be created that permit inheritance that can be used to build higher order stacks.

c) Blueprints – which are a declarative way of creating a live running cluster from scratch. Blueprints are JSON files which have two main parts, a list of host groups (node information) and allied configuration (which define services that go into these nodes) .

Blueprints use specific versions of stacks.

blueprints

Blueprints are submitted to the Ambari REST API. This workflow is covered in the above visual. I will cover these in more detail when  we talk about CloudBreak, a Hadoop As A Service from HDP (Hortonworks Data Platform).

Architecture –

The Ambari architecture operates on a client server model. The Ambari Server serves as the collection point for data from across the cluster and it exposes a REST API. The web UI leverages the REST API to interface with the server.
Each slave node in the cluster has a copy of the Ambari Agent that enables the server to perform provisioning, inventory, config mgmt. and operational activities on the cluster workloads.

AmbariArch

As mentioned above, Ambari also comes pre-packaged with two other popular open source monitoring technologies – Ganglia and Nagios. Each node in the cluster also runs an instance if the
Ganglia Monitor daemon (gmond). The gmond collects metric information which is passed to the Ganglia Connector daemon gmetad, and then on to the Ambari Server for display on the web UI.

The other component is Nagios which is used for alerting against threshold conditions.

Ambari Views

Ambari Views is a newer project that offers a systematic way to add new tools to Ambari using a plug-in model. Any newly-added service or proprietary tools are available to be integrated in the Ambari Web user interface using Views.

These new tools might include such things as a Pig query editor, a Hive query editor, a workflow design tool, a Tez visualization tool, or an Hadoop distributed file system browser.
To sum up…

The top level Apache Ambari project has been moving at a rapid clip in terms of innovation – it had 13 releases in 2013 and 6 releases in 2014.

The latest community version (2.0) released March 2015 supports automated rolling upgrades for components such as ZookKeeper, Ranger, the core YARN & HDFS services. Further Hortonworks are putting in place certification processes as well as proven methodologies to aid enterprise customers in this new capability. This handles potentially vexing issues like component interdependencies and switching correct software versions in a rolling fashion

More information on this new capability at the below link –

http://hortonworks.com/blog/introducing-automated-rolling-upgrades-with-apache-ambari-2-0/

For anyone intending to deploy, run or even planning a Hadoop cluster of any size, Ambari can make the operations angle a breeze.

For more information, please check out the below link –

http://hortonworks.com/hadoop/ambari/

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

4 comments

Ravi Dhulipala November 3, 2015 - 11:53 pm

Vamsi, I work in the payment card industry. This is a timely, fantastic and forward looking post as we are looking to retool legacy systems. You have mentioned very interesting framework along with the technology underpinnings! Looking forward to reading more such posts here.

Reply
Rama December 27, 2015 - 8:48 am

Greetings! This is my first comment here
so I just wanted to give a quick shout out and say I truly enjoy reading through your
articles. Can you recommend any other blogs/websites/forums that deal with the
same topics? Thank you so much!

Reply
Darrel January 28, 2016 - 3:06 am

There’s certainly a lot of progress in Hadoop Monitoring over the last few years.Cheers!

Reply
Firasath August 4, 2016 - 3:22 am

Thank you Vamsi – very informative for a newbie like me.

Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.