假如你明明知道应用服务器有问题, 需要查询相关的服务器文件来定位问题, 却因为权限问题而无从下手怎么办?
基本遇到这样的性能问题, 管理员都一头雾水, 第一时间就来找开发人员; 开发人员又缺乏权限, 难为无米之炊; 必须要等到事情紧急了, 大发了, 惊动领导了, 开发人员给领导发邮件, 领导大手一挥, 管理员才会将权限开放给开发人员, 事情才会有所进展.
最近就遇到这样的情况, 应用有性能问题, 怀疑是FullGC导致的, 但是admin却以权限不足为理由拒绝让我远程登录服务器执行相关命令. 我能看到的只有jboss的日志文件.
最后想出来一个笨办法, 扫描jboss的日志, 找出其中的空窗期, 然后和apache日志一比对, 基本定位了出问题的环节.
脚本很简单: 扫描jboss日志文件, 找出空窗期的日志碎片, 汇总碎片到另一个文本文件中.
serverLogAnalysis.py:
import sys import util serverLogFile = None if len(sys.argv) < 2: serverLogFile = file('serverTest.log') else: serverLogFile = file(sys.argv[1]) print("serverLogFile:" + serverLogFile.name + "\r\n") resultFile = open("result.log", "a") resultFile.truncate() util.process(serverLogFile, resultFile) serverLogFile.close() resultFile.close()
util.py:
import datetime import re from time import mktime def process(serverLogFile, resultFile): previousLine = None lineCounter = 0 for currentLine in serverLogFile: lineCounter += 1 if lineCounter % 100000 == 0: print 'line scanned:' + str(lineCounter) currentStamp = getTimestamp(currentLine) if currentStamp != None: previousStamp = getTimestamp(previousLine) if previousStamp != None: timeGap = getTimeDiffInSeconds(previousStamp, currentStamp) if timeGap >= 120: log = 'TIMESTAMP gap ' + str(timeGap) + 'ms at line ' + str( lineCounter) + '\r\n\t\t' + previousLine + '\t\t' + currentLine print log resultFile.write(log + "\r\n") previousLine = currentLine def getTimeDiffInSeconds(timestamp1, timestamp2): dt1 = datetime.datetime.strptime(timestamp1, '%Y-%m-%d %H:%M:%S,%f') dt2 = datetime.datetime.strptime(timestamp2, '%Y-%m-%d %H:%M:%S,%f') return toSeconds(dt2) - toSeconds(dt1); def getTimestamp(str): if str == None: return None matches = re.findall(r"^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}", str) if len(matches) == 0: return None return matches[0] def toSeconds(dt): return mktime(dt.timetuple());
start.sh:
#!/bin/sh python ./serverLogAnalysis.py ./serverTest.log