点我查看效果更好

4.8跳过迭代对象得开始部分

问题：如何遍历可迭代对象除了开始几个元素以外的所有元素
方案：使用itertools模块的dropwhile()函数

with open('data_file/ch4_02.txt') as f:
    for line in f:
        print(f,end=' ')

<_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'>

ch4_02.txt 得文件开头几行是注释，想要跳过这几行
dropwhile()函数接收两个参数，一个函数对象，一个可迭代对象

from itertools import dropwhile
with open('data_file/ch4_02.txt') as f:
    for line in dropwhile(lambda line:line.startswith('#'),f):
        print(line,end=' ')

a = 23
 # too
 b = 'i love'

如果已经知道需要跳过多少个元素，则可以直接使用islice()

items = ['a','b','c',1,2,3,4]
from itertools import islice
for x in islice(items,3,None):
    print(x,end=' ')

1 2 3 4

也可以使用过下述方法过滤掉开始的几行注释，但是不同的是，它不仅仅只过滤开头的注释行，文件中间的也会过滤

with open('data_file/ch4_02.txt') as f:
    lines = (line for line in f if not line.startswith('#'))
    for line in lines:
        print(line,end=' ')

a = 23
 b = 'i love'

4.9排列组合的迭代

问题：想要迭代遍历一个几何中元素的所有可能的排列或者组合
方案：使用itertools.permunations()
permutations()接收一个集合并产生一个元祖序列，元组的元素为集合中所有元素的一个可能的排列组合

items = ['a','b','c']
from itertools import permutations
for p in permutations(items):
    print(p)

('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

也可以传给他一个可选的长度参数，用来指定长度的排列组合

for p in permutations(items,2):
    print(p)

('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')

使用itertools.combinations()也可以得到输入集合元素的所有组合,但是注意区分和上一个的区别
对于combinations（）来讲元素的顺序并不重要，（a,b）和（b,a）是一样的，只输出一个

from itertools import combinations
items = ['a','b','c']
for c in combinations(items,3):
    print(c)

('a', 'b', 'c')

for c in combinations(items,2):
    print(c)

('a', 'b')
('a', 'c')
('b', 'c')

还可以使用itertools.combinations_with_replacement(),该函数允许同一个元素被选择多次

from itertools import combinations_with_replacement
for c in combinations_with_replacement(items,3):
    print(c)

('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'c')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'c')
('c', 'c', 'c')

4.10序列上索引值迭代

问题：想要在迭代一个序列的同时跟踪正在被处理的元素索引
方案：使用enumerate（）

my_list = ['a','b','c']
for idx,val in enumerate(my_list):
    print(idx,val)

0 a
1 b
2 c

也可以传入一个可选的开始参数

for idx, val in enumerate(my_list,1):
    print(idx,val)

1 a
2 b
3 c

这种操作在遍历文件时想在错误文件中使用行号定位时很有用

def parse_data(filename):
    with open(filename) as f:
        for lineno, line in enumerate(f,1):
            fields = line.split()
            try:
                count = int(fields[1])
                pass
            except ValueError as e:
                print('line {}:parse error:{}'.format(lineno,e))

注意：

data = [(1,2),(3,4),(5,6),(7,8)]
for n,(x,y) in enumerate(data):
    print(n,':',(x,y))

0 : (1, 2)
1 : (3, 4)
2 : (5, 6)
3 : (7, 8)

# 错误的
for n,x,y in enumerate(data):
    print(n,x,y)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-49-e9ca5f47e3c4> in <module>()
      1 # 错误的
----> 2 for n,x,y in enumerate(data):
      3     print(n,x,y)


ValueError: not enough values to unpack (expected 3, got 2)

4.11同时迭代多个序列

问题：想要同时迭代多个序列，每次从每个序列中选取一个元素
方案：使用zip()函数

a = [1,2,3,4]
b = ['a','b','c','d']
c = ['x','y','z']
for x,y in zip(a,b):
    print(x,':',y)

1 : a
2 : b
3 : c
4 : d

zip()函数会生成一个可以返回的元祖（x,y）的迭代器。一旦其中某个序列到达结尾，则迭代结束，因此迭代长度遵循“木桶效应”

for x, y in zip(a,c):
    print(x,':',y)

1 : x
2 : y
3 : z

如果你想要不遵循“木桶效应”，可以使用itertools.zip_longest()

from itertools import zip_longest
for i in zip_longest(a,c):
    print(i)

(1, 'x')
(2, 'y')
(3, 'z')
(4, None)

默认是用None填充，也可以指定填充值

for i in zip_longest(a,c,fillvalue=0):
    print(i)

(1, 'x')
(2, 'y')
(3, 'z')
(4, 0)

当想要处理数据时，zip()也很有用

headers = ['name','shares','price']
values = ['FLC',100,99.09]
s = list(zip(headers,values))
s

[('name', 'FLC'), ('shares', 100), ('price', 99.09)]

for name,val in s:
    print(name,'=',val)

name = FLC
shares = 100
price = 99.09

zip()函数也可以接收三个参数

a = [1,2,3]
b = ['a','b','c']
c = [10,20,30]
for i in zip(a,b,c):
    print(i)

(1, 'a', 10)
(2, 'b', 20)
(3, 'c', 30)

zip()会返回一个迭代器，如果想要将值存储在列表中，可以使用list()函数

zip(a,b)

<zip at 0x999df0>

list(zip(a,b))

[(1, 'a'), (2, 'b'), (3, 'c')]

4.12 不同集合上元素的迭代

问题：想要在多个对象上执行相同的操作，但是这些对象在不同的容器上
方案：使用itertools.chain()
chain()函数接收一个可迭代对象列表作为输入，返回一个迭代器

from itertools import chain
a = [1,2,3]
b = ['a','b','c']
for x in chain(a,b):
    print(x,end=' ')

1 2 3 a b c

也可以先将可迭代对象合并，但是效率很低,占用内存

for i in a+b:
    print(i,end=' ')

1 2 3 a b c

4.13创建数据处理管道

问题：如何以数据管道的方式迭代处理数据
方案：使用生成器函数
yield from语句，它将yield操作代理到父生成器上去。yield from it 简单的返回生成器it做产生的所有值

import os,fnmatch,gzip,bz2,re
def gen_find(filepat,top):
    '''find all filenames in a directory tree that match a shell wildcard pattern'''
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)

def gen_opener(filenames):
    '''open a sequence of files one at a time producing a file object.
    the file is closed immediately when proceeding to the next iteration
    '''
    for filename in filenames:
        if filename.endswith('.gz'):
            f = gzip.open(filename,'rt')
        elif filename.endswith('.bz2'):
            f = bz2.open(filename,'rt')
        else:
            f = open(filename,'rt')
        yield f
        f.close()
def gen_concatenate(iterators):
    '''chain a sequence of iterator together into a single sequence'''
    for it in iterators:
        yield from it

def gen_grep(pattern,lines):
    '''look for a regex pattern in a sequence of lines'''
    pat = re.compile(pattern)
    for line in lines:
        if pat.search(line):
            yield line

4.14展开嵌套的序列

问题：如何将一个嵌套多层的序列展开成一个单层列表
方案：编写一个包含yield from 语句的递归生成器

from collections import Iterable
def flatten(items,ignore_type = (str,bytes)):
    for x in items:
        '''isintance()用来检查某个元素是否是可以迭代的
        isintance(x,ignor_type)用来将字符串和字节排除在可迭代对象之外，
        防止将一个字符串拆分成单个字符
        '''
        if isinstance(x,Iterable) and not isinstance(x,ignore_type):
            yield from flatten(x)
        else:
            yield x
items = [1,2,[3,4,[5,6],7],8]
for x in flatten(items):
    print(x,end=' ')

1 2 3 4 5 6 7 8

items =  ['Dave','Paula',['Thomans','Levis']]
for x in flatten(items):
    print(x,end=' ')

Dave Paula Thomans Levis

yield from 在你想在生成器中调用其它生成器作为子例程时很有用，如果你不想使用它，需要写额外的for循环

from collections import Iterable
def flatten(items,ignore_type = (str,bytes)):
    for x in items:
        if isinstance(x,Iterable) and not isinstance(x,ignore_type):
            for i in flatten(x):
                yield i
        else:
            yield x

4.15顺序迭代合并后的排序迭代对象

问题：有一系列排序序列，想将它们合并后得到一个排序序列并迭代遍历
方案：heapq.merge()

import heapq
a = [1,4,5,7]
b = [2,6,10,13]
for c in heapq.merge(a,b):
    print(c,end=' ')

1 2 4 5 6 7 10 13

4.16 迭代器代替while无限循环

问题：使用迭代器代替while
方案：一种常见的IO操作程序如下

CHUNKSIZE = 8192
def reader(s):
    while True:
        data = s.recv(CHUNKSIZE)
        if data == b'':
            break
        #process_data(data)

使用iter()代替

def reader2(s):
    for chunk in iter(lambda:s.recv(CHUNKSIZE),b''):
        pass
        #process_data(data)

iter() 函数一个鲜为人知的特性是他可以接收一个可选的callable对象和一个标记（结尾）值作为输入参数。当以这种方式使用的时候，它会创建一个迭代器，不断调用callable对象直到返回值和标记相等为止

Python Cookbook学习笔记ch4_02

点我查看效果更好

4.8跳过迭代对象得开始部分

4.9排列组合的迭代

4.10序列上索引值迭代

4.11同时迭代多个序列

4.12 不同集合上元素的迭代

4.13创建数据处理管道

4.14展开嵌套的序列

4.15顺序迭代合并后的排序迭代对象

4.16 迭代器代替while无限循环

猜你喜欢