-
什么是协程
协程,又名微线程。
协程:协作式 -----非抢占式的程序
协程主要解决的也是IO操作的
协程:本质上就是一个线程
协程的优势:
- 没有切换的消耗
- 没有锁的概念
- 优点1:协程极高的执行效率。因为子程序切换不是线程切换,而是由程序自身控制,因此。没有线程切换的开销,和多线程相比,线程数量越多,协程的性能就越明显。
- 优点2:不需要多线程的锁机制,因为只有一个线程,也不存在同时写变量冲突,在协程中控制共享资源不加锁,只需要判断状态就好了,所以执行效率比多线程高很多。
如何用多核呢? 协程+进程,一个很好的解决并发方案!!
因为协程是一个线程执行,那么怎么利用多核cpu呢?最简单的方法是多进程+协程,既充分利用多核,又充分发挥协程的高效率,可获得极高性能
-
协程的几种方式
1.使用协程yield实现生产者消费者模式
#使用协程实现生产者消费者模式
import time
def consumer(name):
print("----->ready to eat baozi...")
while True:
new_baozi = yield
print("[%s] is eating baozi %s"%(name,new_baozi))
def producer():
r= con.__next__()
r = con2.__next__()
n = 0
while 1:
time.sleep(1)
print("making baozi %s and %s"%(n,n+1))
con.send(n)
con2.send(n+1)
n +=2
if __name__ =="__main__":
con = consumer("lian")
con2 = consumer("zong")
p =producer()
2.使用greenlet实现协程
使用greenlet实现协程
from greenlet import greenlet
def test1():
print(12)
gr2.switch() #切换到test2
print(34)
gr2.switch()
def test2():
print(56)
gr1.switch() #切换到test1
print(78)
gr1 = greenlet(test1)
gr2 = greenlet(test2)
#gr2.switch()
gr1.switch() #切换到test1
3.使用gevent实现协程
import time,requests,gevent
start =time.time()
def f(url):
print("GET:%s"%url)
resp = requests.get(url)
data = resp.text
print("%d bytes recevied from %s"%(len(data),url))
gevent.joinall([
gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
])
下面给大家讲协程+进程怎么实现爬取50个网页!
方法一:
普通方法:
import time,requests,gevent
from gevent.pool import Pool as GP
from multiprocessing import Pool,Lock
def f(url):
resp = requests.get(url)
data = resp.text
print("success get :%s" % url)
print("%d bytes recevied from %s\n"%(len(data),url))
if __name__ == "__main__":
start = time.time()
for i in range(50):
f(url = "https://blog.csdn.net/Lzs1998/article/details/87858525")
end = time.time()
print("total time:", end - start)
耗时:
total time: 35.31624388694763
方法二:协程
import time,requests,gevent
from gevent.pool import Pool as GP
from multiprocessing import Pool,Lock
def f(url):
resp = requests.get(url)
data = resp.text
print("success get :%s" % url)
print("%d bytes recevied from %s\n"%(len(data),url))
#
# gevent.joinall([
#
# gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
# gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
# gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
# gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
# gevent.spawn(f,"https://blog.csdn.net/Lzs1998/article/details/87858525"),
#
# ])
# 定义一个协程并发方法(用的是gevent的协程池)
def html(url):
p = GP(5)
# 协程池的map方法可以让你自定义并发次数,这里可以自定义爬取微博网页的并发次数,第一个参数是要执行的函数
# 第二个参数可以理解成需要并发参数的次数
p.map(func=f,iterable=[url for i in range(5)])
#gevent.joinall([gevent.spawn(f,url) for i in range(50)])#等于协程池map
if __name__ == "__main__":
start = time.time()
#dohtml(10)
url = "https://blog.csdn.net/Lzs1998/article/details/87858525"
html(url)
end = time.time()
print("total time:", end - start)
耗时:
total time: 29.46183705329895
方法三:协程+进程
import time,requests,gevent
from gevent.pool import Pool as GP
from multiprocessing import Pool,Lock
def f(url):
resp = requests.get(url)
data = resp.text
lock.acquire()
print("success get :%s" % url)
print("%d bytes recevied from %s\n"%(len(data),url))
lock.release()
# 定义一个协程并发方法(用的是gevent的协程池)
def html(url):
p = GP(5)
# 协程池的map方法可以让你自定义并发次数,这里可以自定义爬取微博网页的并发次数,第一个参数是要执行的函数
# 第二个参数可以理解成需要并发参数的次数
#p.map(func=f,iterable=[url for i in range(5)])
gevent.joinall([gevent.spawn(f,url) for i in range(50)])#等于协程池map
def dohtml(num):
pool = Pool(5)
url = "https://blog.csdn.net/Lzs1998/article/details/87858525"
for i in range(num):
pool.apply_async(func=html,args=(url,))
pool.close()
pool.join()
if __name__ == "__main__":
lock = Lock()
start = time.time()
dohtml(10)
end = time.time()
print("total time:", end - start)
耗时:
total time: 6.0761730670928955
发现协程+进程的性能远远高于其它用法!