Python并发处理比较-欢迎拍砖篇

实验前提:建立test1.txt~test120.txt,120个文档,每个文件大约4M。

实验环境:CPU四核八线程,OS Win10

1 单进程单线程处理

# -*- coding:utf-8 -*-

import os

import time

def FuncReplace(fileName,oldStr,newStr):

    with open(fileName) as read_f,open("t.swap.txt","w") as write_f:

        line=read_f.read()

        for i in range(1000):

             line=line.replace(oldStr,"%s,%d"%(newStr,i))

        write_f.write(line)

    os.remove(fileName)

    os.rename("t.swap.txt",fileName)

 

if __name__=="__main__":

    print("Main Start")

    t0=time.time()

    for i in range(1,121):

        FuncReplace("test%d.txt"%i,"QQQQQQQQQ","aaaaaa")

    t1=time.time()

print("Main end,time=%d"%(t1-t0))

耗时约200秒,实验期间CPU利用率13%左右。只有一个CPU工作。

2 多进程

 

# -*- coding:utf-8 -*-

import os

import time

from multiprocessing import Process

 

def FuncReplace(fileName,oldStr,newStr):

    with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:

         line=read_f.read()

         for i in range(1000):

             line=line.replace(oldStr,"%s,%d"%(newStr,i))

         write_f.write(line)

    os.remove(fileName)

    os.rename("t.%s"%fileName,fileName)

 

if __name__=="__main__":

    print("Main Start")

    t0=time.time()

    lst=[]

    for n in range(1,121):

        p=Process(target=FuncReplace,args=("test%d.txt"%(n), "QQQQQQQQQ","aaaaaa"))

        lst.append(p)

        p.start()

    for p in lst:

        p.join()

    t1=time.time()

    print("Main end,time=%d"%(t1-t0))

执行时间是23秒。我们根据文件数量开了121个进程,期间CPU利用率是100%,处理速度提升较大。

 

3 多线程

# -*- coding:utf-8 -*-

import os

import time

from multiprocessing import Process

from threading import Thread

 

def FuncReplace(fileName,oldStr,newStr):

    with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:

         line=read_f.read()

         for i in range(1000):

             line=line.replace(oldStr,"%s,%d"%(newStr,i))

         write_f.write(line)

    os.remove(fileName)

    os.rename("t.%s"%fileName,fileName)

 

if __name__=="__main__":

    print("Main Start")

    t0=time.time()

    lst=[]

    for n in range(1,121):

        t=Thread(target=FuncReplace,args=("test%d.txt"%(n), "QQQQQQQQQ","aaaaaa"))

        lst.append(t)

        t.start()

    for t in lst:

        t.join()

    t1=time.time()

print("Main end,time=%d"%(t1-t0))

执行时间:219秒,我们开了121个线程处理处理任务,期间CPU利用率在17%左右,貌似也是单核心运行,至于原因,是因为GIL锁,使多线程不能同时执行。

 

4 协程

 

# -*- coding:utf-8 -*-

import  asyncio

import time

import os

 

async def MyReplace(fileName,oldStr,newStr):

    with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:

        line=read_f.read()

        for i in range(1000):

            line=line.replace(oldStr,"%s,%d"%(newStr,i))

        write_f.write(line)

 

async def FuncReplace(fileName,oldStr,newStr):

    print("Replace Running...")

    await MyReplace(fileName,oldStr,newStr)

    return fileName

 

def callback(task):

    print("callback running...")

    os.remove(task.result())

    os.rename("t.%s"%task.result(),task.result())

 

if __name__=="__main__":

    print("Main Start")

    t0=time.time()

    tasks=[]

    coroutines=[FuncReplace("test%d.txt"%(i), "QQQQQQQQQ","aaaaaa") for i in range(1,121)]

    for c in coroutines:

        t=asyncio.ensure_future(c)

        t.add_done_callback(callback)

        tasks.append(t)

    loop = asyncio.get_event_loop()

    loop.run_until_complete(asyncio.wait(tasks))

    loop.close()

    t1=time.time()

    print("Main end,time=%d"%(t1-t0))

执行时间230秒,期间CPU利用率和多线程类似,15%~20%,单核工作。协程是线程内的非阻塞处理,性能和多线程类似。

5 进程+协程

# -*- coding:utf-8 -*-

import os

import time

from multiprocessing import Process

import  asyncio

 

async def MyReplace(fileName,oldStr,newStr):

    with open(fileName) as read_f,open("t.%s"%fileName,"w") as write_f:

        line=read_f.read()

        for i in range(5000):

            line=line.replace(oldStr,"%s,%d"%(newStr,i))

        write_f.write(line)

 

async def AsyFuncReplace(fileName,oldStr,newStr):

    print("AsyReplace Running...")

    await MyReplace(fileName,oldStr,newStr)

    return fileName

 

def callback(task):

    print("callback running...")

    os.remove(task.result())

    os.rename("t.%s"%task.result(),task.result())

 

 

def FuncReplace(procId,oldStr,newStr):

    tasks=[]

    coroutines=[AsyFuncReplace("test%d.txt"%((procId-1)*15+i), "QQQQQQQQQ","aaaaaa") for i in range(1,16)]

    for c in coroutines:

        t=asyncio.ensure_future(c)

        t.add_done_callback(callback)

        tasks.append(t)

    loop = asyncio.get_event_loop()

    loop.run_until_complete(asyncio.wait(tasks))

    loop.close()

 

if __name__=="__main__":

    print("Main Start")

    t0=time.time()

    lst=[]

    for n in range(1,9):

        p=Process(target=FuncReplace,args=(n, "QQQQQQQQQ","aaaaaa"))

        lst.append(p)

        p.start()

    for p in lst:

        p.join()

    t1=time.time()

    print("Main end,time=%d"%(t1-t0))

执行时间86秒,因为实验环境CPU是四核八线程,我开了八个进程,在每个进程内开15个协程,CPU利用率也是100%,满载运行.

6.个人理解总结

下面是我个人理解,如有错误,恳请大神指正:

①python在多核心环境下,开多进程才可以提升并发处理性能;

②多线程以及协程存在的原因,因为GIL锁的问题,多线程无法并行,不能提升并发处理性能;多线程和协程是为了解决比如用户体验层面阻塞问题。

发布了51 篇原创文章 · 获赞 4 · 访问量 4249

猜你喜欢

转载自blog.csdn.net/songjian1104/article/details/99626717