Pyhton中的队列

并发的程序经常会用到队列，用于进程/线程间的通信。

Python 中用于进程间通信和线程间通信的队列的接口是一样的，但是实现和效率却是不同的，一个最大的区别在于：进程间的队列在入队时会对数据进行序列化，而线程间的队列则不会序列化数据，这导致二者效率上会有明显的区别。这一点，文档上没有明确说明，但是可以自己通过实验来证实。

import queue
from threading import Thread
from multiprocessing import Queue, Process
import time

th_q = queue.Queue()
proc_q = Queue()

def th_worker(q):
    time.sleep(5)
    x = q.get()
    print('msg from thread queue: ' + str(x))


def proc_worker(q):
    time.sleep(5)
    x = q.get()
    print('msg from proc queue: ' + str(x))


th = Thread(target=th_worker, args=(th_q,))
th.start()

p = Process(target=proc_worker, args=(proc_q,))
p.start()

d = {'a':1, 'b':2}
th_q.put(d)
proc_q.put(d)
time.sleep(1)
d['a'] = 0
d['b'] = 5

运行上面的程序，结果是：

msg from thread queue: {'b': 5, 'a': 0}

msg from proc queue: {'b': 2, 'a': 1}

可以看出，线程间的队列上保存的是对象的引用，而非序列化的数据，而进程间的队列保存的是序列化的数据，所以不会随着原有对象的改变而改变。

注意d['a'] = 0 前一行的 time.sleep(1)，这一行对于验证两种队列的区别非常重要，如果没有这一行，结果就是这样的：

msg from thread queue: {'b': 5, 'a': 0}

msg from proc queue: {'b': 5, 'a': 0}

这是因为输入到队列的 item 的序列化动作是在一个队列内部的线程中进行的，不会占用使用队列的用户线程的计算时间。（这个队列内部的线程，multiprocess 模块的文档中略有说明：When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.）

猜你喜欢