最近小弟要分析一些gc数据,同时要分析几台机器的gc日志,主要统计每天数据:minor gc次数,最大运行时间,最小运行时间,平均运行时间,minor gc引起的总的stop world时间,major gc次数,major gc引起的总的stop world时间。
原始数据:
2012-10-25T13:00:22.244+0800:234273.762: [GC 234273.762: [ParNew: 1418872K->53180K(1501888K), 0.0337000secs] 3512438K->2148338K(4057792K), 0.0340400 secs] [Times: user=0.14sys=0.00, real=0.03 secs] 2012-10-25T13:00:37.762+0800:234289.280: [GC 234289.280: [ParNew: 1418556K->44716K(1501888K), 0.0383930secs] 3513714K->2143690K(4057792K), 0.0387180 secs] [Times: user=0.17 sys=0.00,real=0.04 secs] 2012-10-25T13:00:37.804+0800:234289.322: [GC [1 CMS-initial-mark: 2098974K(2555904K)] 2154135K(4057792K),0.0379380 secs] [Times: user=0.04 sys=0.00, real=0.04 secs] 2012-10-25T13:00:37.843+0800:234289.360: [CMS-concurrent-mark-start] 2012-10-25T13:00:38.410+0800:234289.928: [CMS-concurrent-mark: 0.568/0.568 secs] [Times: user=1.39 sys=0.10,real=0.57 secs] 2012-10-25T13:00:38.410+0800:234289.928: [CMS-concurrent-preclean-start] 2012-10-25T13:00:38.431+0800:234289.949: [CMS-concurrent-preclean: 0.018/0.021 secs] [Times: user=0.02sys=0.00, real=0.02 secs] 2012-10-25T13:00:38.431+0800:234289.949: [CMS-concurrent-abortable-preclean-start] CMS: abort precleandue to time 2012-10-25T13:00:43.774+0800: 234295.292:[CMS-concurrent-abortable-preclean: 5.312/5.343 secs] [Times: user=7.37sys=0.54, real=5.34 secs] 2012-10-25T13:00:43.776+0800:234295.294: [GC[YG occupancy: 579432 K (1501888 K)]234295.294: [Rescan(parallel) , 0.1664420 secs]234295.460: [weak refs processing, 0.0121300secs]234295.473: [class unloading, 0.0363220 secs]234295.509: [scrub symbol& string tables, 0.0235950 secs] [1 CMS-remark: 2098974K(2555904K)]2678406K(4057792K), 0.2485670 secs] [Times: user=0.78 sys=0.00, real=0.24 secs] …………………… 2012-10-25T13:00:53.494+0800:234305.012: [GC 234305.012: [ParNew: 1410092K->50641K(1501888K), 0.0385890secs] 1704007K->349502K(4057792K), 0.0389030 secs] [Times: user=0.14sys=0.00, real=0.04 secs]
考虑使用expect这个脚本工具自动分析gc数据,同时bash端使用多子进程模式Map加快速度。神器地址http://expect.sourceforge.net/
应用MapReduce模型,这里的Map就是采集子进程(负责单台机器的数据采集),代码:
#map task parallelexecuting ,statistics one host,using 'expect' map(){ local TMP_FILE=$RESULT_FILE"_"$HOST"_"$1 local TMP_MAP_RESULT_FILE=$TMP_FILE"_mapresult" echo "$TMP_MAP_RESULT_FILE" { #使用expect脚本自动分析数据,call脚本见后面分析 /usr/bin/expect -f ./call $USER $HOST $DATE > $TMP_FILE echo $HOST"_"$DATE > $TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "Total=" >>$TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "Average=" >>$TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "Min=" >>$TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "Max=" >>$TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "MajorCount=" >>$TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "TotalInitMark=" >>$TMP_MAP_RESULT_FILE grep -v "grep" $TMP_FILE| grep "TotalRemark=" >>$TMP_MAP_RESULT_FILE }& #子进程模拟Map任务 }
Reduce就是将上述子进程的采集数据做一个merge,因为最后只需要一个文件,所以reduce只有1个,主进程完成即可
#reduce task,justmerge map result file reduce(){ #取Map过程产生的临时文件 for FILE in ${MAP_RESULT[@]} do echo "--------------------------" #直接标准输出,也可输出到文件 cat $FILE echo "--------------------------" done }
主进程控制流程,代码:
echo "start statjob" ……………… #Array store mapresult files' name declare -a MAP_RESULT echo "mapstart" for HOST in${SERVER_ARRAY[@]} do MAP_RESULT[$COUNT]=`map $COUNT` ((COUNT=$COUNT+1)) done #wait to map task end #主进程在此等待Map任务全部完成 wait echo "mapend" echo "reducestart" reduce echo "reduceend" echo "startclear" clear echo "endclear"
核心call脚本使用expect脚本,自动登录到目标主机上采集数据,将结果输出到本地文件,代码:
#!/usr/bin/expect set USER [lindex$argv 0] set HOST [lindex$argv 1] set DATE [lindex$argv 2] set LOG_PATH"/home/admin/logs/gc.log" set timeout 30 spawn/usr/bin/ssh $USER@$HOST expect ".*id_dsa':" send "\r" expect"password:" #xxxxxxx是跳板机密码 send"xxxxxxxx\r" sleep 1 send "grep'$DATE' $LOG_PATH /gc.log|grep 'ParNew' |awk '{print \$15}'|awk -F'=' '{print\$2}'|awk 'BEGIN{total=0}{total+=\$1}END{print\"Total=\"total}'\r" sleep 1 send "grep'$DATE' $LOG_PATH /gc.log|grep 'ParNew' |awk '{print \$10}'|awk'{sum+=\$1} END {print \"Average=\"sum/NR}'\r" sleep 1 send "grep'$DATE' $LOG_PATH /gc.log|grep 'ParNew' |awk '{print \$10}'|awk 'BEGIN {min =1999999} {if (\$1<min) min=\$1 fi} END {print\"Min=\"min}'\r" sleep 1 send "grep'$DATE' $LOG_PATH /gc.log|grep 'ParNew' |awk '{print \$10}'|awk 'BEGIN{max = 0} {if (\$1>max) max=\$1 fi} END {print\"Max=\"max}'\r" sleep 1 send "echo\"MajorCount=\"`grep '$DATE' $LOG_PATH |grep 'CMS-remark'|wc -l`\r" sleep 1 send "grep'$DATE' $LOG_PATH |grep 'CMS-init' |awk -F'real=' '{print \$2}'|awk '{print\$1}'|awk 'BEGIN{total=0}{total+=\$1}END{print\"TotalInitMark=\"total}'\r" sleep 1 send "grep'$DATE' $LOG_PATH |grep 'CMS-remark'|awk -F'real=' '{print \$2}'|awk'{print \$1}'|awk 'BEGIN{total=0}{total+=\$1}END{print\"TotalRemark=\"total}'\r" send"exit\r" expect eof
代码很简单,模拟登录,然后grep+awk分析gc.log,然后退出目标主机
在跳板机上执行:
. /statgcParallel.sh-u xiaodong.langxd -s slave1:slave2 -d 2012-11-07
输出:
start stat job map start map end reduce start -------------------------- slave1_2012-11-07 Total=30.57 Average=0.0340765 Min=0.0104980 Max=0.1478140 MajorCount=1 TotalInitMark=0.05 TotalRemark=0.25 -------------------------- -------------------------- slave2_2012-11-07 Total=165.37 Average=0.0268371 Min=0.0160940 Max=0.0812770 MajorCount=4 TotalInitMark=0.09 TotalRemark=1.08 -------------------------- reduce end start clear end clear
搏君一笑~~~