Understanding Yarn: Business Analogy
- Bhupendra patni
- Jun 22, 2015
- 4 min read
This blog post targeted to folks working with Hadoop, to better understand Yarn framework, make it easy to remember roles / processes and learn to develop application with it.
The post focuses on correlating Yarn framework with business process which requires optimal resource allocation. The analogy will help understand and remember all the roles, its communication and purpose in the resource allocation with Yarn.
Yarn in one Sentence
Yarn “Yet Another Resource Negotiator” is a vehicle to manage the cluster resources for Apache Hadoop and share the resource among compute frameworks like MapReduce, Spark and Impala.
Why this blog?
To simplify understanding and building Yarn application framework and its terminologies like:
What is a Resource Manager, Application Master and Node Manager?
What is Yarn Application object?
What is a ApplicationMaster object?
Why do I need to instantiate certain objects?
How do they communicate with each other?
and there were many questions to be able to understand the entire framework, its components and their communication.
Business Scenario & Analogy
Let's take an example of a company which owns many factory locations and resources at multiple locations, help make multiple products and allocates factory resources based on product orders coming through Sales Representative.
There are many ways you can allocate factory resources to a Sales Rep e.g. First Come First Serve Basis, allocate factory resources by Location, allocate factory where the raw material is available etc.
Defining Success Criteria
However, the most important question is “what would you need to optimally run this business?”:
Let us try to list criteria to successfully run the business:
Find factory location which is available to work on new orders.
Allocate factory resources at the location(s) in which it is easier to avail most of the raw material required for the order and saves time in moving raw material.
Completing order on time even when there are failures at one factory location.
System and Process required to be successful.
Let us try to define the location and roles which are required in successfully running the company, optimally use factory resources and complete client orders.
Head Office - Communicate with each factory location and manage the utilization and schedule
Manager - to manage resources like capacity and raw material availability / accessibility
Scheduler - Manage schedule for all factory locations
Order Manager - Provides availability for all available factory locations and resources. This also works as a fail-safe mechanism to re-route orders in case of any failure in completing the order. Manage status of all orders across all factory locations, ensure all orders are completed on time and assign order to another factory in case of failure.
Local to each factory - Manager or team which is responsible to manage the resource at a factory location and executes for the specific order.
Factory Manager - Avail all factory locations, resources and raw material required for the order
Order Manager - Work with Order Executors to complete order across multiple locations and provide updates to the sales rep/head office on the status of each order
Order Executor - Execute an order at one factory location
Business Process
Let us try to define the process and how each role interfaces with others while completing an order for the client.
Sales Rep contacts Company Manager and submits the order request.
Company Manager in response provides the confirmation.
Sales Rep provides order details with resources, priority and instructions on processing an order.
Company Manager contacts one of the available Factory Manager to find Order Executor Information.
Orders Manager contacts Order Manager to get factory capacity and order resource requirements.
Order Manager works with Orders Manager to get other factory availability and resource information.
Order Manager then work with Order Executor(s) to executes the order across multiple factory locations and provide updates back to Sales Rep and Head Office.
If you are able to understand various roles and process for Factory resource management then it would now be easy to understand various roles and process on how Yarn works and various components of Yarn.
Yarn Roles and Processes
Now, Let’s correlate our business analogy with Yarn and see how each role relates to another role in Yarn:
Sales Rep is a YarnClient in Yarn
Company Manager is an Applications Manager in Yarn
Scheduler is a Scheduler in Yarn
Orders Manager is a Application Master Service
Factory Manager is a Node Manager
Order Manager is an Application Master
Order Executor is a container
Let us revise the Yarn process & roles:
Yarn Client contacts Applications Manager and submits the Application request.
Applications Manager in response provides Application ID for the confirmation.
Client provides application details with resources, priority and instructions on running an application.
Applications Master Service contacts one of the available Node Manager to find Application Master Information.
Applications Master contacts Application Master to get capacity and application resource requirements.
Application Master works with Applications Master to get other Node Manager and resource information.
Application Master then work with Container(s) to execute application across multiple nodes and provide updates back to Client and Resource Manager.
I hope the post will make it easy to remember Yarn process and roles involved in resource allocation for various Yarn applications.
Thank you for reading and would love your feedback.
Comments