airflow版本:2.4.3
airflow-helm版本:1.8.0
k8s版本:1.20
时间:2023/03/30
说明:本文在测试环境已完成全流程跑通,尚未在生产环境使用,计划上生产。
最终效果
只有webserver和scheduler这2个容器,日志挂载在vm的路径(自动会收集到elk日志平台),用LocalExecutor。
常用命令:
helm install my-airflow --namespace 命名空间 ./airflow1.8.0 #安装
helm uninstall my-airflow --namespace 命名空间 ./airflow1.8.0 #卸载
helm upgrade my-airflow --namespace 命名空间 ./airflow1.8.0 #更新
详细步骤
0.创建k8s密钥
kubectl create secret generic my-webserver-secret --from-literal="webserver-secret-key=$(python3 -c 'import secrets; print(secrets.token_hex(16))')" -n 命名空间
1.修改values.yaml
#1.密钥
webserverSecretKeySecretName: my-webserver-secret
#2.postgresql
postgresql:
enabled: false
pgbouncer:
enabled: false
#3.redis
redis:
enabled: false
#4.xecutor配置
executor: "LocalExecutor"
#5.开dag例子
extraEnv: |
- name: AIRFLOW__CORE__LOAD_EXAMPLES
value: 'True'
#6.置节点选择和容忍污点 *改为实际要改的内容
# Select certain nodes for airflow pods.
nodeSelector:
*: *
affinity: {}
tolerations:
- effect: NoSchedule
key: *
value: *
#7.参数修改
# 镜像配置
# Default airflow tag to deploy
defaultAirflowTag: "2.3.4-python3.7"
aiflow version (Used to make some decisions based on Airflow Version being deployed)
airflowVersion: "2.3.4"
#8.像配置 *改为实际要改的内容
images:
airflow:
repository: *
tag: *
pullPolicy: IfNotPresent
#9.镜像密钥 *改为实际要改的内容
registry:
secretName: *
#10.ingress配置开开关,并设置host) host改为实际的内容
# Ingress configuration
ingress:
# Enable all ingress resources (deprecated - use ingress.web.enabled and ingress.flower.enabled)
enabled: ~
# Configs for the Ingress of the web Service
web:
# Enable web ingress resource
enabled: true #改这里
# Annotations for the web Ingress
annotations: {}
# The path for the web Ingress
path: "/"
# The pathType for the above path (used only with Kubernetes v1.19 and above)
pathType: "ImplementationSpecific"
# The hostname for the web Ingress (Deprecated - renamed to `ingress.web.hosts`)
host: "airflow-web.命名空间.svc.za" # 根据实际改
# The hostnames or hosts configuration for the web Ingress
hosts: []
#11.日志修改1
logs:
persistence:
# Enable persistent volume for storing logs
enabled: false
# Volume size for logs
size: 0Gi
#12.日志修改2
logGroomerSidecar:
# Whether to deploy the Airflow scheduler log groomer sidecar.
enabled: false
#13.日志修改3
.Values.workers.persistence.enabled = false
#14.禁用triggerer服务
# Airflow Triggerer Config
triggerer:
enabled: false
#15.配置mysql *按需修改为需要的内容
data:
metadataSecretName: ~
resultBackendSecretName: ~
brokerUrlSecretName: ~
metadataConnection:
user: airflow
pass: *
protocol: mysql
host: *
port: 3306
db: airflow_k8s_test
sslmode: disable
#16.禁用statsd (有待商榷)
statsd:
enabled: false
#17.时区配置
# Volumes for all airflow containers
volumes:
- hostPath:
path: /etc/localtime
name: vm-localtime
- hostPath:
path: /etc/timezone
name: vm-timezone
#18.VolumeMounts for all airflow containers
volumeMounts:
- mountPath: /etc/localtime
name: vm-localtime
readOnly: true
- mountPath: /etc/timezone
name: vm-timezone
readOnly: true
#19.容器属组属组配置 按需
# User and group of airflow user
uid: 0
gid: 0
#20.按需修改对应服务的反亲和 多个服务请参考如下一同修改
# Select certain nodes for airflow scheduler pods.
nodeSelector: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
component: airflow-scheduler #按需修改
topologyKey: kubernetes.io/zone #可用区级反亲和
weight: 100
#21.按需设置容器资源限制
忽略
#22.副本数
推荐web和scheduler副本数都为1
2.修改模板
#1.日志挂载到vm
scheduler 的dp模板修改:注释emptyDir添加如下内容
{
{- else if not $stateful }}
- name: logs
hostPath:
path: /tmp/airflow-scheduler-log #此处写上vm挂载日志的目录,按需修改
#emptyDir: {}
{
{- else }}
#2.pod名,服务名等自定义
请按需对各个模板文件进行修改
3.日志采集filebeat的配置修改
扫描二维码关注公众号,回复:
14804020 查看本文章
#如果filebeat采集日志的化,注意修改filebeat的日志采集路径,如下参考。
#因为scheduler的日志在多个层级的目录中
#请按需配置日志清理
- /tmp/airflow-scheduler-log/*/*.log
- /tmp/airflow-scheduler-log/*/*/*.log
- /tmp/airflow-scheduler-log/*/*/*/*.log
- /tmp/airflow-scheduler-log/*/*/*/*/*.log
4.重建镜像,解决日志的bug
#参考链接:https://blog.csdn.net/weixin_40861707/article/details/119918467
#问题现象-无法看日志,会提示下面这个日志看不到(个人觉得是如下第二行应该访问service的,访问成了pod)
*** Log file does not exist: /opt/airflow/logs/dag_id=example_bash_operator/run_id=manual__2023-03-30T07:00:55.479560+00:00/task_id=runme_1/attempt=1.log
*** Fetching from: http://airflow-scheduler-fljs8998-fsj4873:8793/log/dag_id=example_bash_operator/run_id=manual__2023-03-30T07:00:55.479560+00:00/task_id=runme_1/attempt=1.log
*** Failed to fetch log file from worker. [Errno 111] Connection refused
#解决办法
步骤一:修改原始镜像的file_task_handler.py文件的190行附件
---修改前-----
url = os.path.join("http://{ti.hostname}:{worker_log_server_port}/log", log_relative_path).format(
ti=ti, worker_log_server_port=conf.get('logging', 'WORKER_LOG_SERVER_PORT')
)
---修改前-----
主要是把pod name改为service name
---修改后-----
hostname_junyang={ti.hostname}
hostname_junyang1="http://svc-"+"-".join(list(hostname_junyang)[0].split("-")[:-2])+":{worker_log_server_port}/log"
url = os.path.join(hostname_junyang1, log_relative_path).format(
ti=ti, worker_log_server_port=conf.get('logging', 'WORKER_LOG_SERVER_PORT')
)
---修改后-----
步骤二:编写dockerfile
FROM airflow:2.3.4-python3.7
RUN rm -f /home/airflow/.local/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py && rm -f /home/airflow/.local/lib/python3.7/site-packages/airflow/utils/log/__pycache__/file_task_handler.cpython-37.pyc || echo 123
COPY --chown=airflow:root file_task_handler.py /home/airflow/.local/lib/python3.7/site-packages/airflow/utils/log/file_task_handler.py
步骤三:构建并推送镜像
步骤三:修改values.yaml中的镜像,并upgrade