版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/yangguosb/article/details/87300801
基本介绍
YARN是Hadoop生态的一个重要组件,提供通用的资源管理功能。最初是为了将Mapreduce V1的资源管理与作业调度/监控分离,后来由于其通用性已经成了Hadoop生态的资源管理组件,如下图所示:
架构图
典型的Master/Slave架构,Resource Manager为Master简称RM,Node Manager为Slave简称NM。RM为管控节点,接受任务并为任务分配资源及NM;NM为真正执行任务的节点,负责任务的执行及周期性地RM汇报节点资源使用情况。
- RM:资源管控节点,NM节点资源的统一管理;
- NM:任务执行节点(每个节点一个),任务以Container形式在NM上执行,并周期性向RM汇报资源使用情况;
- AM:任务调度/监控者(每个应用一个),将任务调度到各Container执行并监控执行情况;
- Container:任务申请到资源后在NM上启动的进程,统称为Container;
工作流程
- 用户通过client向RM提交任务,及队列名和资源使用量;
- RM接收到请求后,选择一个NM启动AM;
- AM向RM申请资源,RM通过后通知NM启动Container;
- Container启动后执行任务,向AM汇报进度,同时NM向RM汇报资源使用情况;
- 当所有的Container任务执行完成后,AM向RM注销并退出,RM通知NM停止Container;
名词解释
- 资源:目前YARN的资源模型中,只有CPU和内存两种,通过参数Vcore和Mem设置;
/**
* <p><code>Resource</code> models a set of computer resources in the
* cluster.</p>
*
* <p>Currently it models both <em>memory</em> and <em>CPU</em>.</p>
*
* <p>The unit for memory is megabytes. CPU is modeled with virtual cores
* (vcores), a unit for expressing parallelism. A node's capacity should
* be configured with virtual cores equal to its number of physical cores. A
* container should be requested with the number of cores it can saturate, i.e.
* the average number of threads it expects to have runnable at a time.</p>
*
* <p>Virtual cores take integer values and thus currently CPU-scheduling is
* very coarse. A complementary axis for CPU requests that represents processing
* power will likely be added in the future to enable finer-grained resource
* configuration.</p>
*
* <p>Typically, applications request <code>Resource</code> of suitable
* capability to run their component tasks.</p>
*
* @see ResourceRequest
* @see ApplicationMasterProtocol#allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest)
*/
- 队列:可以理解为动态的资源池,具有最大值和最小值,最小值是YARN保证队列中的程序执行时具有的最小资源,最大值是队列可以使用的资源上限;
参考: