dr.elephant环境搭建及使用详解

Dr.elephant是一款对hadoop、Hive和Spark任务进行性能监控和调优的工具，它由LinkedIn的团队于2016年开源。

一、环境搭建

整体环境：dr.elephant 2.0.13, hadoop 2.6.5, spark 2.2.3

1.jdk8安装

2.play framework安装

1）play framework下载解压

下载Play,在页面的最下部找到安装包链接进行下载。

$wget https://downloads.typesafe.com/play/2.2.6/play-2.2.6.zip
$unzip play-2.2.6.zip

2）play framework环境变量配置

$vim ~.bash_profile：
export JAVA_HOME=/opt/java
export PLAY_HOME=/opt/play-2.2.6
export PATH=$PATH:$PLAY_HOME:$JAVA_HOME/bin

3）playf ramework测试
创建一个新应用/项目

play new helloworld

3.dr.elephant编译

1）从github.com上下载dr-elephant-2.0.13

$wget https://github.com/linkedin/dr-elephant/archive/v2.0.13.zip
$unzip v2.0.13.zip

2）修改配置文件compile.conf

$cd ~/dr-elephant-2.0.13
$vim compile.conf
hadoop_version=2.6.5
spark_version=2.2.3
play_opts="-Dsbt.repository.config=app-conf/resolver.conf"

3）编译生成安装包

$cd ~/dr-elephant-2.0.13
#以下编译不成功，可以反复编译，或者换一个版本，直到编译成功。
$./compile.sh ./compile.conf
$ls dist
dr-elephant-2.0.13.zip

编译完成后，会有SUCCESS的提示。这时可以看到在源码文件夹中，多了一个目录dist，进入这个目录可以看到，里面有一个zip包dr-elephant-2.0.13.zip，解压缩这个zip包，生成dr-elephant-2.0.13的代码，可用于部署使用。

4.dr.elephant部署

1）dr.elephant部署

$cd ~/dr-elephant-2.0.13/dist
$mv dr-elephant-2.0.13.zip /opt/
$cd /opt/;unzip dr-elephant-2.0.13.zip

2）修复SQL文件的BUG

$cd /opt/dr-elephant-2.0.13
$vim conf/evolutions/default/1.sql
Replace lines 49-51, from
create index yarn_app_result_i4 on yarn_app_result (flow_exec_id);
create index yarn_app_result_i5 on yarn_app_result (job_def_id);
create index yarn_app_result_i6 on yarn_app_result (flow_def_id);
to
create index yarn_app_result_i4 on yarn_app_result (flow_exec_id(191));
create index yarn_app_result_i5 on yarn_app_result (job_def_id(191));
create index yarn_app_result_i6 on yarn_app_result (flow_def_id(191));

可以解决以下抛错：
[error] c.j.b.ConnectionHandle - Database access problem. Killing off this connection and all remaining connections in the connection pool. SQL State = HY000

mysql的字符集必须设置为UTF8

3）更改配置文件

设置mysql信息：vim app-conf/elephant.conf
调整采集线程数和时间间隔：vim app-conf/GeneralConf.xml

<configuration>
  <property>
    <name>drelephant.analysis.thread.count</name>
    <value>15</value>
    <description>Number of threads to analyze the completed jobs</description>
  </property>
  <property>
    <name>drelephant.analysis.fetch.interval</name>
    <value>60000</value>
    <description>Interval between fetches in milliseconds</description>
  </property>
  <property>
    <name>drelephant.analysis.retry.interval</name>
    <value>60000</value>
    <description>Interval between retries in milliseconds</description>
  </property>
  <property>
    <name>drelephant.application.search.match.partial</name>
    <value>true</value>
    <description>If this property is "false", search will only make exact matches</description>
  </property>
</configuration>

修改drelephant.analysis.thread.count，默认是3，建议修改到15，3的话从jobhistoryserver读取的速度太慢，高于15的话又读取的太快，会对jobhistoryserver造成很大压力。下面两个一个是读取的时间周期，一个是重试读取的间隔时间周期。

4）启动进程

$cd /opt/dr-elephant-2.0.13
$sh -x bin/start.sh app-conf/

5）web浏览器访问

http://$ip:8080