姓名:曹月
学号:2017035107003
码云地址:https://gitee.com/caoyue1/third_assignment/tree/master
程序分析
1.将文档读取到缓冲区
def process_file(dst): try: p = open(dst,'r') except IOError as s: print (s) return None try: bvffer = p.read() except: print ("Read File Error!") return None p.close() return bvffer
2.添加处理缓冲区 bvffer代码,统计每个单词的频率,存放在字典word_freq
def process_buffer(bvffer): if bvffer: word_freq = {} bvffer = bvffer.lower() for fh in ',.!?+-_': bvffer = bvffer.replace(fh, " ") words = bvffer.strip().split() for word in words: word_freq[word] = word_freq.get(word, 0) + 1 return word_freq
3.输出 Top 10 的单词
def output_result(word_freq): if word_freq: sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True) for item in sorted_word_freq[:10]: # 输出 Top 10 的单词 print (item)
4.运行程序
if __name__ == "__main__": import argparse parser = argparse.ArgumentParser() parser.add_argument('dst') args = parser.parse_args() dst = args.dst bvffer = process_file(dst) word_freq = process_buffer(bvffer) output_result(word_freq)
程序截图
1.运行大、小文件
2.运行结果