软工作业3：个人编程练习

一、编译环境

　　myeclipse2017、python3.7.0

二、程序分析

　　（1）读文件到缓冲区（process_file(dst)）

 1 def process_file(dst):  # 读文件到缓冲区
 2     try:     # 打开文件
 3         file = open(dst, 'r')  # dst为文本的目录路径
 4     except IOError as s:
 5         print(s)
 6         return None
 7     try:     # 读文件到缓冲区
 8         bvffer = file.read()
 9     except:
10         print("Read File Error!")
11         return None
12     file.close()
13     return bvffer

　　（2）处理缓冲区 bvffer代码，统计每个单词的频率，存放在字典word_freq（process_buffer(bvffer)）

 1 def process_buffer(bvffer):  # 处理缓冲区，返回存放每个单词频率的字典word_freq
 2     if bvffer:
 3         # 下面添加处理缓冲区bvffer代码，统计每个单词的频率，存放在字典word_freq
 4         word_freq = {}
 5         # 将文本内容都改为小写
 6         bvffer = bvffer.lower()
 7         #去除文本中的中英文标点符号
 8         for ch in '“‘!;,.?”':
 9             bvffer = bvffer.replace(ch, " ")
10         # strip()删除空白符（包括'/n', '/r','/t'）；split()以空格分割字符串
11         words = bvffer.strip().split()
12         for word in words:
13             word_freq[word] = word_freq.get(word, 0) + 1
14         return word_freq

　　（3）输出 Top 10 的单词（output_result(word_freq)）

1 def output_result(word_freq):  #按照单词的频数排序，输出前十的单词
2     if word_freq:
3         sorted_word_freq = sorted(word_freq.items(), key=lambda v: v[1], reverse=True)
4         for item in sorted_word_freq[:10]:  # 输出 Top 10 的单词
5             print("单词:%s 频数:%d " % (item[0], item[1]))

　　（4）主函数（main()）

1 if __name__ == "__main__":
2     import argparse
3     dst = "F:\Python\Gone_with_the_wind.txt" 
4     
5     bvffer = process_file(dst)
6     word_freq = process_buffer(bvffer)
7     output_result(word_freq)

三、代码风格说明

　　（1）import语句应该分行书写

1 import cProfile
2 import pstats

　　（2）使用4个空格进行缩进

1 def process_file(dst):  
2     try:     
3         file = open(dst, 'r')

　　（3）每行代码尽量不超过80个字符

四、程序运行命令、运行结果截图

　　（1）编写完成word_freq.py，在DOS窗口执行

　　　　《Gone_with_the_wind》:

　　　　运行命令：python word_freq.py Gone_with_the_wind.txt

　　　　运行截图：

　　　　《A_Tale_of_Two_Cities》

　　　　运行命令：python word_freq.py A_Tale_of_Two_Cities.txt

　　　　运行截图：

　　（2）在myeclipse2017中直接运行

　　　　《Gone_with_the_wind》

　　　　《A_Tale_of_Two_Cities》

五、性能分析及结果改进

　　附：ncalls：表示函数调用的次数； tottime：表示指定函数的总的运行时间，除掉函数中调用子函数的运行时间；

percall：（第一个percall）等于 tottime/ncalls； cumtime：表示该函数及其所有子函数的调用运行的时间；

percall：（第二个percall）即函数运行一次的平均时间，等于 cumtime/ncalls；

filename:lineno(function)：每个函数调用的具体信息；

　　（1）执行时间最长的代码

 1 for ch in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
           bvffer = bvffer.lower()

　　（2）执行次数最多的代码

1 for word in words:
2             word_freq[word] = word_freq.get(word, 0) + 1

　　（3）使用 cProfile 进行性能分析

　　统计《飘》-Gone with the wind 的词频：

1 python -m cProfile word_freq.py Gone_with_the_wind.txt | grep word_freq.py

　　（4）可视化操作

　　　　工具：graphviz，gprof2dot

　　　　对word_freq.out进行可视化操作：

1 F:\通大教学网\软件工程>python -m cProfile -o result.out -s cumulative word_freq.py Gone_with_the_wind.txt
2 F:\通大教学网\软件工程>python gprof2dot.py -f pstats result.out | dot -Tpng -o result.png

　　　　转换得到图如下：

　　（5）改进代码

　　　　分析：通过减少函数调用次数、优化算法的方式来改进代码。

　　　　实现：通过减少函数对字母大小写的判断次数来节省时间。

　　　　原来的代码：

1 for ch in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
2             bvffer = bvffer.lower()

　　　　改进后的代码：

1 bvffer = bvffer.lower()

　　　　对比：

　　　　原来的代码运行结果：

　　　　改进后的代码运行结果：

　　　　结论：通过结果分析发现改进后的代码比改进前的代码运行时间减少了1.903s

软工作业3：个人编程练习

猜你喜欢