Spark任务启动后,我们通常都是通过跳板机去Spark UI界面查看对应任务的信息,一旦任务多了之后,这将会是让人头疼的问题。如果能将所有任务信息集中起来监控,那将会是很完美的事情。
通过Spark官网指导文档,发现Spark只支持以下sink
Each instance can report to zero or more sinks. Sinks are contained in the org.apache.spark.metrics.sink
package:
ConsoleSink
: Logs metrics information to the console.CSVSink
: Exports metrics data to CSV files at regular intervals.JmxSink
: Registers metrics for viewing in a JMX console.MetricsServlet
: Adds a servlet within the existing Spark UI to serve metrics data as JSON data.GraphiteSink
: Sends metrics to a Graphite node.Slf4jSink
: Sends metrics to slf4j as log entries.StatsdSink
: Sends metrics to a StatsD node.
没有比较常用的Influxdb和Prometheus ~~~
谷歌一把发现要支持influxdb需要使用第三方包,比较有参考意义的是这篇,Monitoring Spark Streaming with InfluxDB and Grafana ,在提交任务的时候增加file和配置文件,但成功永远不会这么轻松。。。
写入influxdb的数据都是以application_id命名的,类似这种application_1533838659288_1030_1_jvm_heap_usage,也就是说每个任务的指标都是在单独的表,最终我们展示在grafana不还得一个一个配置么?
显然这个不是我想要的结果,最终目的就是:一次配置后每提交一个任务自动会在监控上看到。
谷歌是治愈一切的良药,终究找到一个比较完美的解决方案,就是通过graphite_exporter中转数据后接入Prometheus,再通过grafana展示出来。
所以,目前已经实践可行的方案有两个
方案一:
监控数据直接写入influxdb,再通过grafana读取数据做展示,步骤如下:
1.在spark下 conf/metrics.properties 加入以下配置
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSourc
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.influx.class=org.apache.spark.metrics.sink.InfluxDbSink
*.sink.influx.protocol=http
*.sink.influx.host=xx.xx.xx.xx
*.sink.influx.port=8086
*.sink.influx.database=sparkonyarn
*.sink.influx.auth=admin:admin
2.在提交任务的时候增加以下配置,并确保以下jar存在
--files /spark/conf/metrics.properties \
--conf spark.metrics.conf=metrics.properties \
--jars /spark/jars/metrics-influxdb-1.1.8.jar,/spark/jars/spark-influx-sink-0.4.0.jar \
--conf spark.driver.extraClassPath=metrics-influxdb-1.1.8.jar:spark-influx-sink-0.4.0.jar \
--conf spark.executor.extraClassPath=metrics-influxdb-1.1.8.jar:spark-influx-sink-0.4.0.jar
缺点:application_id发生变化需要重新配置grafana
方案二(目前在用的):
通过graphite_exporter将原生数据通过映射文件转化为有 label 维度的 Prometheus 数据
1.下载graphite_exporter,解压后执行以下命令,其中graphite_exporter_mapping需要我们自己创建,内容为数据映射文件
nohup ./graphite_exporter --graphite.mapping-config=graphite_exporter_mapping &
例如
mappings:
- match: '*.*.jvm.*.*'
name: jvm_memory_usage
labels:
application: $1
executor_id: $2
mem_type: $3 qty: $4
会将数据转化成 metric name
为 jvm_memory_usage
,label
为 application
,executor_id
,mem_type
,qty
的格式。
application_1533838659288_1030_1_jvm_heap_usage
-> jvm_memory_usage{application="application_1533838659288_1030",executor_id="driver",mem_type="heap",qty="usage"}
2.配置 Prometheus 从 graphite_exporter 获取数据,重启prometheus
/path/to/prometheus/prometheus.yml
scrape_configs:
- job_name: 'spark'
static_configs:
- targets: ['localhost:9108']
3.在spark下 conf/metrics.properties 加入以下配置
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSourc
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.protocol=tcp
*.sink.graphite.host=xx.xx.xx.xx
*.sink.graphite.port=9109
*.sink.graphite.period=5
*.sink.graphite.unit=seconds
4.提交spark任务的时候增加 --files /spark/conf/metrics.properties
5.最后在grafana创建prometheus数据源,创建需要的指标,最终效果如下,有新提交的任务不需要再配置监控,直接选择application_id就可以看对应的信息
需要用到的jar包
https://repo1.maven.org/maven2/com/izettle/metrics-influxdb/1.1.8/metrics-influxdb-1.1.8.jar
https://mvnrepository.com/artifact/com.palantir.spark.influx/spark-influx-sink
模板
mappings:
- match: '*.*.executor.filesystem.*.*'
name: filesystem_usage
labels:
application: $1
executor_id: $2
fs_type: $3
qty: $4
- match: '*.*.executor.threadpool.*'
name: executor_tasks
labels:
application: $1
executor_id: $2
qty: $3
- match: '*.*.executor.jvmGCTime.count'
name: jvm_gcTime_count
labels:
application: $1
executor_id: $2
- match: '*.*.executor.*.*'
name: executor_info
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.*.jvm.*.*'
name: jvm_memory_usage
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.jvm.pools.*.*'
name: jvm_memory_pools
labels:
application: $1
executor_id: $2
mem_type: $3
qty: $4
- match: '*.*.BlockManager.*.*'
name: block_manager
labels:
application: $1
executor_id: $2
type: $3
qty: $4
- match: '*.driver.DAGScheduler.*.*'
name: DAG_scheduler
labels:
application: $1
type: $2
qty: $3
- match: '*.driver.*.*.*.*'
name: task_info
labels:
application: $1
task: $2
type1: $3
type2: $4
qty: $5
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 6,
"iteration": 1566958793172,
"links": [],
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 42,
"panels": [],
"title": "Batch Info",
"type": "row"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorPostfix": false,
"colorPrefix": false,
"colorValue": true,
"colors": [
"#73BF69",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "SparkFromGraphite",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 6,
"w": 8,
"x": 0,
"y": 1
},
"id": 44,
"interval": "30s",
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"options": {},
"pluginVersion": "6.2.5",
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "task_info{type1=\"StreamingMetrics\",type2=\"streaming\", application=\"$application_ID\",qty=\"totalCompletedBatches\"}",
"format": "time_series",
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{totalCompletedBatches}}",
"refId": "A"
}
],
"thresholds": "#73BF69",
"timeFrom": null,
"timeShift": null,
"title": "CompletedBatches",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": true,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "SparkFromGraphite",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 6,
"w": 8,
"x": 8,
"y": 1
},
"id": 48,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"options": {},
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "task_info{type1=\"StreamingMetrics\",type2=\"streaming\", application=\"$application_ID\",qty=\"totalProcessedRecords\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{qty}}",
"refId": "A"
}
],
"thresholds": "#299c46",
"timeFrom": null,
"timeShift": null,
"title": "ProcessedRecords",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": true,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "SparkFromGraphite",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 6,
"w": 8,
"x": 16,
"y": 1
},
"id": 46,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"options": {},
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "task_info{type1=\"StreamingMetrics\",type2=\"streaming\", application=\"$application_ID\",qty=\"waitingBatches\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{qty}}",
"refId": "A"
}
],
"thresholds": "",
"timeFrom": null,
"timeShift": null,
"title": "waitingBatches",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"fill": 1,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 7
},
"id": 52,
"interval": "10s",
"legend": {
"avg": true,
"current": true,
"max": true,
"min": true,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "task_info{type1=\"StreamingMetrics\",type2=\"streaming\", application=\"$application_ID\",qty=\"lastReceivedBatch_records\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{qty}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Received Records",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"fill": 1,
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 7
},
"id": 50,
"legend": {
"avg": false,
"current": true,
"max": true,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "task_info{type1=\"StreamingMetrics\",type2=\"streaming\", application=\"$application_ID\",qty=~\"lastCompletedBatch_totalDelay|lastCompletedBatch_processingDelay|lastCompletedBatch_schedulingDelay\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{qty}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Batch_Delay",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 15
},
"id": 34,
"panels": [],
"title": "Executor Memory",
"type": "row"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"fill": 1,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 16
},
"id": 14,
"legend": {
"avg": false,
"current": true,
"max": false,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "heap_max",
"yaxis": 1
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "jvm_memory_usage{mem_type=\"heap\", qty=\"used\", application=\"$application_ID\", executor_id!~\"driver\"}",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
"legendFormat": "{{executor_id}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor Heap Used",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"decimals": 2,
"fill": 1,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 16
},
"id": 18,
"legend": {
"avg": false,
"current": true,
"max": true,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max(jvm_memory_usage{mem_type=\"heap\", qty=\"usage\", application=\"$application_ID\", executor_id!~\"driver\"})",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "max_usage",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor Max Heap Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"fill": 1,
"gridPos": {
"h": 7,
"w": 8,
"x": 0,
"y": 23
},
"id": 16,
"legend": {
"avg": false,
"current": true,
"max": false,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "jvm_gcTime_count{application=\"$application_ID\"} / 1000",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
"legendFormat": "{{executor_id}} ",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor GCT (s)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 0,
"grid": {},
"gridPos": {
"h": 7,
"w": 8,
"x": 8,
"y": 23
},
"id": 7,
"legend": {
"avg": false,
"current": true,
"max": true,
"min": false,
"show": false,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "jvm_memory_pools{executor_id!~\"driver\", application=\"$application_ID\",qty=\"usage\", mem_type=\"PS-Eden-Space\"}",
"format": "time_series",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{executor_id}}",
"metric": "jvm_memory_pools",
"refId": "A",
"step": 2,
"target": ""
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor Eden-Space",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 0,
"grid": {},
"gridPos": {
"h": 7,
"w": 8,
"x": 16,
"y": 23
},
"id": 9,
"legend": {
"avg": false,
"current": true,
"max": true,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "jvm_memory_pools{executor_id!~\"driver\", application=\"$application_ID\",qty=\"usage\", mem_type=\"PS-Old-Gen\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{executor_id}}",
"metric": "jvm_memory_pools",
"refId": "A",
"step": 2,
"target": ""
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor Old-Gen",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 30
},
"id": 32,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"decimals": null,
"fill": 1,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 2
},
"id": 12,
"legend": {
"avg": false,
"current": true,
"hideEmpty": false,
"hideZero": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": true
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "max",
"yaxis": 1
},
{
"alias": "max_usage",
"yaxis": 2
},
{
"alias": "usage",
"yaxis": 2
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "jvm_memory_usage{mem_type=\"heap\", qty=\"used\", application=\"$application_ID\", executor_id=\"driver\"}",
"format": "time_series",
"hide": false,
"instant": false,
"intervalFactor": 1,
"legendFormat": "{{qty}}",
"refId": "A"
},
{
"expr": "jvm_memory_usage{mem_type=\"heap\", qty=\"max\", application=\"$application_ID\", executor_id=\"driver\"}",
"format": "time_series",
"hide": false,
"instant": false,
"interval": "",
"intervalFactor": 1,
"legendFormat": "{{qty}}",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Driver Heap Usage",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "decbytes",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 1,
"grid": {},
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 2
},
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 2,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "jvm_memory_pools{executor_id=\"driver\", application=\"$application_ID\",qty=\"used\"}",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{mem_type}}",
"metric": "jvm_memory_pools",
"refId": "A",
"step": 2,
"target": ""
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Driver JVM Memory Pools",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "decbytes",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"title": "Driver Memory",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 31
},
"id": 36,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"fill": 1,
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 18
},
"id": 38,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "executor_tasks{application=\"$application_ID\",qty=\"completeTasks\"}",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{executor_id}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Completed Tasks Per Executor",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"fill": 1,
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 18
},
"id": 40,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(executor_tasks{application=\"$application_ID\",qty=\"completeTasks\"}[1m])",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{executor_id}}",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Completed Tasks Per Minute",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"title": "Task Metrics",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 32
},
"id": 28,
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 0,
"grid": {},
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 24
},
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "total",
"linewidth": 4
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(filesystem_usage{qty=\"read_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"}[1m])",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{executor_id}}",
"metric": "filesystem_usage",
"refId": "A",
"step": 1,
"target": ""
},
{
"expr": "sum(rate(filesystem_usage{qty=\"read_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"}[1m]))",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
"legendFormat": "total",
"metric": "filesystem_usage",
"refId": "B",
"step": 1,
"target": ""
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "HDFS Read Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 0,
"grid": {},
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 24
},
"id": 30,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "total",
"linewidth": 4,
"yaxis": 1
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(filesystem_usage{qty=\"write_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"}[1m])",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "{{executor_id}}",
"metric": "filesystem_usage",
"refId": "A",
"step": 1,
"target": ""
},
{
"expr": "sum(rate(filesystem_usage{qty=\"write_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"}[1m]))",
"format": "time_series",
"hide": false,
"intervalFactor": 1,
"legendFormat": "total",
"metric": "filesystem_usage",
"refId": "B",
"step": 1,
"target": ""
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "HDFS Write Rate",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "Bps",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"title": "HDFS Read/Write Byte Rate",
"type": "row"
},
{
"collapsed": true,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 33
},
"id": 22,
"panels": [
{
"aliasColors": {
"total": "#6ed0e0"
},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 1,
"grid": {},
"gridPos": {
"h": 7,
"w": 12,
"x": 0,
"y": 25
},
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "total",
"linewidth": 4
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "filesystem_usage{qty=\"read_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"}",
"format": "time_series",
"hide": false,
"instant": false,
"intervalFactor": 1,
"legendFormat": "{{executor_id}}",
"metric": "filesystem_usage",
"refId": "A",
"step": 2,
"target": ""
},
{
"expr": "sum(filesystem_usage{qty=\"read_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"})",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "total",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor HDFS Reads (Stacked)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "decbytes",
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "SparkFromGraphite",
"editable": true,
"error": false,
"fill": 1,
"grid": {},
"gridPos": {
"h": 7,
"w": 12,
"x": 12,
"y": 25
},
"id": 26,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "connected",
"options": {},
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "total",
"linewidth": 4,
"yaxis": 1
}
],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "filesystem_usage{qty=\"write_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"}",
"format": "time_series",
"hide": false,
"intervalFactor": 2,
"legendFormat": "{{executor_id}}",
"metric": "filesystem_usage",
"refId": "A",
"step": 2,
"target": ""
},
{
"expr": "sum(filesystem_usage{qty=\"write_bytes\", fs_type=\"hdfs\", application=\"$application_ID\"})",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "total",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Executor HDFS Writes (Stacked)",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "cumulative"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "decbytes",
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"title": "HDFS Bytes Reads/Writes Per Executor",
"type": "row"
}
],
"refresh": false,
"schemaVersion": 18,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allFormat": "glob",
"allValue": null,
"current": {
"text": "application_1566481621886_138866",
"value": "application_1566481621886_138866"
},
"datasource": "SparkFromGraphite",
"definition": "label_values(application)",
"hide": 0,
"includeAll": false,
"label": "",
"multi": false,
"multiFormat": "glob",
"name": "application_ID",
"options": [],
"query": "label_values(application)",
"refresh": 1,
"refresh_on_load": false,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-12h",
"to": "now"
},
"timepicker": {
"now": true,
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "Spark-on-Yarn",
"uid": "yJ1JPHTiz",
"version": 40
}
参考资料
https://github.com/palantir/spark-influx-sink
https://spark.apache.org/docs/latest/monitoring.html
https://www.linkedin.com/pulse/monitoring-spark-streaming-influxdb-grafana-christian-g%C3%BCgi
https://github.com/prometheus/prometheus/wiki/Default-port-allocations
https://github.com/prometheus/graphite_exporter
https://prometheus.io/download/
https://rokroskar.github.io/monitoring-spark-on-hadoop-with-prometheus-and-grafana.html
https://blog.csdn.net/lsshlsw/article/details/82670508
https://www.jianshu.com/p/274380bb0974