python利用大数据和管道分析操作系统日志

一 代码

map代码
import os
import re
import time
def Map(sourceFile):
    if not os.path.exists(sourceFile):
        print(sourceFile, ' does not exist.')
        return    
    pattern = re.compile(r'[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}')
    result = {}
    with open(sourceFile, 'r') as srcFile:
        for dataLine in srcFile:
            r = pattern.findall(dataLine)
            if r:
                print(r[0], ',', 1)
if __name__ == '__main__':
    Map('test.txt') 
 
reduce代码
import os
import sys
def Reduce(targetFile):
    result = {}
    for line in sys.stdin:
        riqi, shuliang = line.strip().split(',')
        result[riqi] = result.get(riqi, 0)+1
    with open(targetFile, 'w') as fp:
        for k,v in result.items():
            fp.write(k + ':' + str(v) + '\n')
if __name__ == '__main__':
    Reduce('result.txt')
 
二 运行结果
在命令行中运行下面的语句 :
E:\python\python可以这样学\第11章 大数据处理\code>python Hadoop_Map.py test.txt | python Hadoop_Reduce.py
07/10/2013 :4635
07/11/2013 :1
07/16/2013 :51
08/15/2013 :3958
10/09/2013 :733
12/11/2013 :564
02/12/2014 :4102
05/14/2014 :737

猜你喜欢

转载自cakin24.iteye.com/blog/2384913