进程( 四)----协程加进程

 

  • 什么是协程

协程,又名微线程。

协程:协作式 -----非抢占式的程序

协程主要解决的也是IO操作的

协程:本质上就是一个线程

协程的优势:

  1. 没有切换的消耗
  2. 没有锁的概念
  • 优点1:协程极高的执行效率。因为子程序切换不是线程切换,而是由程序自身控制,因此。没有线程切换的开销,和多线程相比,线程数量越多,协程的性能就越明显。
  • 优点2:不需要多线程的锁机制,因为只有一个线程,也不存在同时写变量冲突,在协程中控制共享资源不加锁,只需要判断状态就好了,所以执行效率比多线程高很多。

如何用多核呢? 协程+进程,一个很好的解决并发方案!!

因为协程是一个线程执行,那么怎么利用多核cpu呢?最简单的方法是多进程+协程,既充分利用多核,又充分发挥协程的高效率,可获得极高性能

  • 协程的几种方式

1.使用协程yield实现生产者消费者模式



#使用协程实现生产者消费者模式
import  time
def consumer(name):
    print("----->ready to eat baozi...")
    while True:
        new_baozi = yield
        print("[%s] is eating baozi %s"%(name,new_baozi))

def producer():
    r= con.__next__()
    r = con2.__next__()
    n = 0
    while 1:
        time.sleep(1)
        print("making baozi %s and %s"%(n,n+1))
        con.send(n)
        con2.send(n+1)
        n  +=2

if __name__ =="__main__":
    con = consumer("lian")
    con2 = consumer("zong")
    p =producer()

2.使用greenlet实现协程

使用greenlet实现协程
from greenlet import  greenlet
def test1():
    print(12)
    gr2.switch() #切换到test2
    print(34)
    gr2.switch()
def test2():
    print(56)
    gr1.switch() #切换到test1
    print(78)


gr1 = greenlet(test1)
gr2 = greenlet(test2)
#gr2.switch()
gr1.switch() #切换到test1

3.使用gevent实现协程

import time,requests,gevent
start =time.time()
def f(url):
    print("GET:%s"%url)
    resp = requests.get(url)
    data = resp.text
    print("%d bytes recevied from %s"%(len(data),url))

gevent.joinall([

    gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
    gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
    gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
    gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
    gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),

])

下面给大家讲协程+进程怎么实现爬取50个网页!

方法一:

普通方法:

import time,requests,gevent
from gevent.pool import Pool as GP
from multiprocessing import  Pool,Lock

def f(url):


    resp = requests.get(url)
    data = resp.text
  
    print("success get :%s" % url)

    print("%d bytes recevied from %s\n"%(len(data),url))



if __name__ == "__main__":
    
    start = time.time()
 
    for i in range(50):
        f(url = "https://blog.csdn.net/Lzs1998/article/details/87858525")
  
    end = time.time()
    print("total time:", end - start)

耗时:

total time: 35.31624388694763

方法二:协程

import time,requests,gevent
from gevent.pool import Pool as GP
from multiprocessing import  Pool,Lock

def f(url):


    resp = requests.get(url)
    data = resp.text
  
    print("success get :%s" % url)

    print("%d bytes recevied from %s\n"%(len(data),url))
    
#
# gevent.joinall([
#
#     gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
#     gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
#     gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
#     gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
#     gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
#
# ])
# 定义一个协程并发方法(用的是gevent的协程池)
def html(url):
    p  = GP(5)
    # 协程池的map方法可以让你自定义并发次数,这里可以自定义爬取微博网页的并发次数,第一个参数是要执行的函数
    # 第二个参数可以理解成需要并发参数的次数
    p.map(func=f,iterable=[url for i in range(5)])
    #gevent.joinall([gevent.spawn(f,url) for i in range(50)])#等于协程池map



if __name__ == "__main__":
   
    start = time.time()
    #dohtml(10)
   
    url = "https://blog.csdn.net/Lzs1998/article/details/87858525"
    html(url)
    end = time.time()
    print("total time:", end - start)

耗时:

total time: 29.46183705329895

方法三:协程+进程

import time,requests,gevent
from gevent.pool import Pool as GP
from multiprocessing import  Pool,Lock

def f(url):
    resp = requests.get(url)
    data = resp.text
    lock.acquire()
    print("success get :%s" % url)
    print("%d bytes recevied from %s\n"%(len(data),url))
    lock.release()

# 定义一个协程并发方法(用的是gevent的协程池)
def html(url):
    p  = GP(5)
    # 协程池的map方法可以让你自定义并发次数,这里可以自定义爬取微博网页的并发次数,第一个参数是要执行的函数
    # 第二个参数可以理解成需要并发参数的次数
    #p.map(func=f,iterable=[url for i in range(5)])
    gevent.joinall([gevent.spawn(f,url) for i in range(50)])#等于协程池map

def dohtml(num):
    pool = Pool(5)
    url = "https://blog.csdn.net/Lzs1998/article/details/87858525"
    for i  in range(num):
        pool.apply_async(func=html,args=(url,))
    pool.close()
    pool.join()

if __name__ == "__main__":
    lock = Lock()
    start = time.time()
    dohtml(10)
    end = time.time()
    print("total time:", end - start)

耗时:

total time: 6.0761730670928955

发现协程+进程的性能远远高于其它用法!

猜你喜欢

转载自blog.csdn.net/Lzs1998/article/details/87863269
今日推荐