一、Azkaban 的安装及配置
1.1 环境准备
1.1.1 数据库准备
- 将安装包上传到指定目录/opt/software/azkaban
- 解压
- 将db文件解压,里面有个all相关的sql:
将sql文件导入到数据库:
1.1.2 azkaban的服务端配置
- 将azkaban-exec的那个压缩包解压到:
- 修改 azkaban.properties 文件
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Where the Azkaban web server is located
azkaban.webserver.url=http://hadoop102:8081
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=192.168.109.135
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
executor.port=12321
进入到exec的安装目录(其配置文件中很多都是相对路径):
bin/starte-exec.sh
注意:如果mysql的版本是8以上,则需要去 lib 目录下将默认的 5.1.28的mysql驱动版本删除,然后在自己加入8的驱动版本就可以了
- 激活azkaban:
curl -G "hadoop102:12321/executor?action=activate" && echo
- 激活后查看数据库
0:未激活
1:已激活
1.1.3 azkaban的web端配置
- 解压到与server端同一目录下
- 依旧是修改azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=false
jetty.maxThreads=25
jetty.port=8081
# Azkaban Executor settings
# mail settings
[email protected]
mail.host=smtp.qq.com
[email protected]
mail.password=xrkaryjkftmxgaec
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=192.168.109.135
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
azkaban.executorselector.filters=StaticRemainingFlowSize,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
- 修改用户:
vim azkaban-users.xml
<azkaban-users>
<user groups="azkaban" password="azkaban" roles="admin" username="azkaban"/>
<user password="metrics" roles="metrics" username="metrics"/>
<user password="root" roles="admin" username="root"/>
<role name="admin" permissions="ADMIN"/>
<role name="metrics" permissions="METRICS"/>
</azkaban-users>
二、azkaban的基本使用
2.1 编写测试文件
first.project
```project
azkaban-flow-version: 2.0
```
first.flow
```yaml
nodes:
- name: jobA
type: command
config:
command: echo "Hello World"
```
2.2 将这个两个文件压缩成一个 .zip包
然后创建一个项目:
将zip包上传
然后点执行
以上是比较简单的流程
下面比较复杂的(存在依赖关系):
nodes:
- name: jobA
type: command
config:
command: echo "Hello World AAA"
- name: jobB
type: command
config:
command: echo "Hello World BBB"
- name: jobC
type: command
config:
command: echo "Hello World CCC"
dependsOn:
- jobA
- jobB
执行后的结果:
2.3 azkaban的邮件报警功能:
- 配置邮箱
在web的properties文件中配置:
# mail settings
[email protected]
mail.host=smtp.qq.com
[email protected]
mail.password=xxxxxxxxxxxx
- 重启azkaban的web服务:然后在下面配置发送邮箱
2.4 azkaban的启停脚本
#!/bin/bash
start-web(){
for i in hadoop102; do
ssh $i "cd /opt/module/azkaban/azkaban-web/ ; bin/start-web.sh"
done
}
stop-web(){
for i in hadoop102; do
ssh $i "cd /opt/module/azkaban/azkaban-web/ ; bin/shutdown-web.sh"
done
}
start-exec(){
for i in hadoop102 hadoop103 hadoop104; do
ssh $i "cd /opt/module/azkaban/azkaban-exec/ ; bin/start-exec.sh"
done
}
activate-exec(){
for i in hadoop102 hadoop103 hadoop104; do
ssh $i "curl -G '$i:12321/executor?action=activate' && echo"
done
}
stop-exec(){
for i in hadoop102 hadoop103 hadoop104; do
ssh $i "/opt/module/azkaban/azkaban-exec/bin/shutdown-exec.sh"
done
}
case $1 in
start-exec )
start-exec
;;
a-exec )
activate-exec
;;
stop-exec )
stop-exec
;;
start-web )
start-web
;;
stop-web )
stop-web
;;
esac
三、azkaban调度全流程
3.1 准备数据
3.1.1 日志数据
(1)修改/opt/module/applog 下的 application.properties
#业务日期
mock.date=2020-06-20
注意:分发至其他需要生成数据的节点
[root@hadoop102 applog]$ xsync application.properties
(2)生成数据
[root@hadoop102 bin]$ lg.sh
注意:生成数据之后,记得查看 HDFS 数据是否存在!
(3)观察 HDFS 的/origin_data/gmall/log/topic_log/2020-06-26 路径是否有数据
3.1.2 业务数据准备
(1)修改/opt/module/db_log 下的 application.properties
mock.date=2020-06-20
(2)生成数据
[root@hadoop102 db_log]$ java -jar gmall2020-mock-db-2020-04-01.jar
(3)观察 SQLyog 中 order_infor 表中 operate_time 中有 2020-06-26 日期的数据
3.2 开始调度
3.1 编写配置文件
gmall.project
azkaban-flow-version: 2.0
gamll.flow
nodes:
- name: mysql_to_hdfs
type: command
config:
command: /usr/bin/mysql_to_hdfs.sh all ${
dt}
- name: hdfs_to_ods_log
type: command
config:
command: /usr/bin/hdfs_to_ods_log.sh ${
dt}
- name: hdfs_to_ods_db
type: command
dependsOn:
- mysql_to_hdfs
config:
command: /usr/bin/hdfs_to_ods_db.sh all ${
dt}
- name: ods_to_dwd_log
type: command
dependsOn:
- hdfs_to_ods_log
config:
command: /usr/bin/ods_to_dwd_log.sh ${
dt}
- name: ods_to_dwd_db
type: command
dependsOn:
- hdfs_to_ods_db
config:
command: /usr/bin/ods_to_dwd_db.sh all ${
dt}
- name: dwd_to_dws
type: command
dependsOn:
- ods_to_dwd_log
- ods_to_dwd_db
config:
command: /usr/bin/dwd_to_dws.sh ${
dt}
- name: dws_to_dwt
type: command
dependsOn:
- dwd_to_dws
config:
command: /usr/bin/dws_to_dwt.sh ${
dt}
- name: dwt_to_ads
type: command
dependsOn:
- dws_to_dwt
config:
command: /usr/bin/dwt_to_ads.sh ${
dt}
- name: hdfs_to_mysql
type: command
dependsOn:
- dwt_to_ads
config:
command: /usr/bin/hdfs_to_mysql.sh all
然后将这个两个文件压缩成一个 gmall.zip包,上传
3.2 web端执行操作
这里可以看所有调度任务
然后整个流程就调度完毕:
总结
感谢大家阅、互相学习;
感谢尚硅谷提供的学习资料;
有问题评论或者发邮箱;
gitee:很多代码仓库;
[email protected]