实验前提:建立test1.txt~test120.txt,120个文档,每个文件大约4M。
实验环境:CPU四核八线程,OS Win10
1 单进程单线程处理
# -*- coding:utf-8 -*-
import os
import time
def FuncReplace(fileName,oldStr,newStr):
with open(fileName) as read_f,open("t.swap.txt","w") as write_f:
line=read_f.read()
for i in range(1000):
line=line.replace(oldStr,"%s,%d"%(newStr,i))
write_f.write(line)
os.remove(fileName)
os.rename("t.swap.txt",fileName)
if __name__=="__main__":
print("Main Start")
t0=time.time()
for i in range(1,121):
FuncReplace("test%d.txt"%i,"QQQQQQQQQ","aaaaaa")
t1=time.time()
print("Main end,time=%d"%(t1-t0))
耗时约200秒,实验期间CPU利用率13%左右。只有一个CPU工作。
2 多进程
# -*- coding:utf-8 -*-
import os
import time
from multiprocessing import Process
def FuncReplace(fileName,oldStr,newStr):
with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:
line=read_f.read()
for i in range(1000):
line=line.replace(oldStr,"%s,%d"%(newStr,i))
write_f.write(line)
os.remove(fileName)
os.rename("t.%s"%fileName,fileName)
if __name__=="__main__":
print("Main Start")
t0=time.time()
lst=[]
for n in range(1,121):
p=Process(target=FuncReplace,args=("test%d.txt"%(n), "QQQQQQQQQ","aaaaaa"))
lst.append(p)
p.start()
for p in lst:
p.join()
t1=time.time()
print("Main end,time=%d"%(t1-t0))
执行时间是23秒。我们根据文件数量开了121个进程,期间CPU利用率是100%,处理速度提升较大。
3 多线程
# -*- coding:utf-8 -*-
import os
import time
from multiprocessing import Process
from threading import Thread
def FuncReplace(fileName,oldStr,newStr):
with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:
line=read_f.read()
for i in range(1000):
line=line.replace(oldStr,"%s,%d"%(newStr,i))
write_f.write(line)
os.remove(fileName)
os.rename("t.%s"%fileName,fileName)
if __name__=="__main__":
print("Main Start")
t0=time.time()
lst=[]
for n in range(1,121):
t=Thread(target=FuncReplace,args=("test%d.txt"%(n), "QQQQQQQQQ","aaaaaa"))
lst.append(t)
t.start()
for t in lst:
t.join()
t1=time.time()
print("Main end,time=%d"%(t1-t0))
执行时间:219秒,我们开了121个线程处理处理任务,期间CPU利用率在17%左右,貌似也是单核心运行,至于原因,是因为GIL锁,使多线程不能同时执行。
4 协程
# -*- coding:utf-8 -*-
import asyncio
import time
import os
async def MyReplace(fileName,oldStr,newStr):
with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:
line=read_f.read()
for i in range(1000):
line=line.replace(oldStr,"%s,%d"%(newStr,i))
write_f.write(line)
async def FuncReplace(fileName,oldStr,newStr):
print("Replace Running...")
await MyReplace(fileName,oldStr,newStr)
return fileName
def callback(task):
print("callback running...")
os.remove(task.result())
os.rename("t.%s"%task.result(),task.result())
if __name__=="__main__":
print("Main Start")
t0=time.time()
tasks=[]
coroutines=[FuncReplace("test%d.txt"%(i), "QQQQQQQQQ","aaaaaa") for i in range(1,121)]
for c in coroutines:
t=asyncio.ensure_future(c)
t.add_done_callback(callback)
tasks.append(t)
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
t1=time.time()
print("Main end,time=%d"%(t1-t0))
执行时间230秒,期间CPU利用率和多线程类似,15%~20%,单核工作。协程是线程内的非阻塞处理,性能和多线程类似。
5 进程+协程
# -*- coding:utf-8 -*-
import os
import time
from multiprocessing import Process
import asyncio
async def MyReplace(fileName,oldStr,newStr):
with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:
line=read_f.read()
for i in range(5000):
line=line.replace(oldStr,"%s,%d"%(newStr,i))
write_f.write(line)
async def AsyFuncReplace(fileName,oldStr,newStr):
print("AsyReplace Running...")
await MyReplace(fileName,oldStr,newStr)
return fileName
def callback(task):
print("callback running...")
os.remove(task.result())
os.rename("t.%s"%task.result(),task.result())
def FuncReplace(procId,oldStr,newStr):
tasks=[]
coroutines=[AsyFuncReplace("test%d.txt"%((procId-1)*15+i), "QQQQQQQQQ","aaaaaa") for i in range(1,16)]
for c in coroutines:
t=asyncio.ensure_future(c)
t.add_done_callback(callback)
tasks.append(t)
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
if __name__=="__main__":
print("Main Start")
t0=time.time()
lst=[]
for n in range(1,9):
p=Process(target=FuncReplace,args=(n, "QQQQQQQQQ","aaaaaa"))
lst.append(p)
p.start()
for p in lst:
p.join()
t1=time.time()
print("Main end,time=%d"%(t1-t0))
执行时间86秒,因为实验环境CPU是四核八线程,我开了八个进程,在每个进程内开15个协程,CPU利用率也是100%,满载运行.
6.个人理解总结
下面是我个人理解,如有错误,恳请大神指正:
①python在多核心环境下,开多进程才可以提升并发处理性能;
②多线程以及协程存在的原因,因为GIL锁的问题,多线程无法并行,不能提升并发处理性能;多线程和协程是为了解决比如用户体验层面阻塞问题。