版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/FANGLICHAOLIUJIE/article/details/82315864
点我查看效果更好
4.8跳过迭代对象得开始部分
- 问题:如何遍历可迭代对象除了开始几个元素以外的所有元素
- 方案:使用itertools模块的dropwhile()函数
with open('data_file/ch4_02.txt') as f:
for line in f:
print(f,end=' ')
<_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'> <_io.TextIOWrapper name='data_file/test1_3.txt' mode='r' encoding='cp936'>
- ch4_02.txt 得文件开头几行是注释,想要跳过这几行
- dropwhile()函数接收两个参数,一个函数对象,一个可迭代对象
from itertools import dropwhile
with open('data_file/ch4_02.txt') as f:
for line in dropwhile(lambda line:line.startswith('#'),f):
print(line,end=' ')
a = 23
# too
b = 'i love'
- 如果已经知道需要跳过多少个元素,则可以直接使用islice()
items = ['a','b','c',1,2,3,4]
from itertools import islice
for x in islice(items,3,None):
print(x,end=' ')
1 2 3 4
- 也可以使用过下述方法过滤掉开始的几行注释,但是不同的是,它不仅仅只过滤开头的注释行,文件中间的也会过滤
with open('data_file/ch4_02.txt') as f:
lines = (line for line in f if not line.startswith('#'))
for line in lines:
print(line,end=' ')
a = 23
b = 'i love'
4.9排列组合的迭代
- 问题:想要迭代遍历一个几何中元素的所有可能的排列或者组合
- 方案:使用itertools.permunations()
- permutations()接收一个集合并产生一个元祖序列,元组的元素为集合中所有元素的一个可能的排列组合
items = ['a','b','c']
from itertools import permutations
for p in permutations(items):
print(p)
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
- 也可以传给他一个可选的长度参数,用来指定长度的排列组合
for p in permutations(items,2):
print(p)
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
- 使用itertools.combinations()也可以得到输入集合元素的所有组合,但是注意区分和上一个的区别
- 对于combinations()来讲元素的顺序并不重要,(a,b)和(b,a)是一样的,只输出一个
from itertools import combinations
items = ['a','b','c']
for c in combinations(items,3):
print(c)
('a', 'b', 'c')
for c in combinations(items,2):
print(c)
('a', 'b')
('a', 'c')
('b', 'c')
- 还可以使用itertools.combinations_with_replacement(),该函数允许同一个元素被选择多次
from itertools import combinations_with_replacement
for c in combinations_with_replacement(items,3):
print(c)
('a', 'a', 'a')
('a', 'a', 'b')
('a', 'a', 'c')
('a', 'b', 'b')
('a', 'b', 'c')
('a', 'c', 'c')
('b', 'b', 'b')
('b', 'b', 'c')
('b', 'c', 'c')
('c', 'c', 'c')
4.10序列上索引值迭代
- 问题:想要在迭代一个序列的同时跟踪正在被处理的元素索引
- 方案:使用enumerate()
my_list = ['a','b','c']
for idx,val in enumerate(my_list):
print(idx,val)
0 a
1 b
2 c
- 也可以传入一个可选的开始参数
for idx, val in enumerate(my_list,1):
print(idx,val)
1 a
2 b
3 c
- 这种操作在遍历文件时想在错误文件中使用行号定位时很有用
def parse_data(filename):
with open(filename) as f:
for lineno, line in enumerate(f,1):
fields = line.split()
try:
count = int(fields[1])
pass
except ValueError as e:
print('line {}:parse error:{}'.format(lineno,e))
- 注意:
data = [(1,2),(3,4),(5,6),(7,8)]
for n,(x,y) in enumerate(data):
print(n,':',(x,y))
0 : (1, 2)
1 : (3, 4)
2 : (5, 6)
3 : (7, 8)
# 错误的
for n,x,y in enumerate(data):
print(n,x,y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-49-e9ca5f47e3c4> in <module>()
1 # 错误的
----> 2 for n,x,y in enumerate(data):
3 print(n,x,y)
ValueError: not enough values to unpack (expected 3, got 2)
4.11同时迭代多个序列
- 问题:想要同时迭代多个序列,每次从每个序列中选取一个元素
- 方案:使用zip()函数
a = [1,2,3,4]
b = ['a','b','c','d']
c = ['x','y','z']
for x,y in zip(a,b):
print(x,':',y)
1 : a
2 : b
3 : c
4 : d
- zip()函数会生成一个可以返回的元祖(x,y)的迭代器。一旦其中某个序列到达结尾,则迭代结束,因此迭代长度遵循“木桶效应”
for x, y in zip(a,c):
print(x,':',y)
1 : x
2 : y
3 : z
- 如果你想要不遵循“木桶效应”,可以使用itertools.zip_longest()
from itertools import zip_longest
for i in zip_longest(a,c):
print(i)
(1, 'x')
(2, 'y')
(3, 'z')
(4, None)
- 默认是用None填充,也可以指定填充值
for i in zip_longest(a,c,fillvalue=0):
print(i)
(1, 'x')
(2, 'y')
(3, 'z')
(4, 0)
- 当想要处理数据时,zip()也很有用
headers = ['name','shares','price']
values = ['FLC',100,99.09]
s = list(zip(headers,values))
s
[('name', 'FLC'), ('shares', 100), ('price', 99.09)]
for name,val in s:
print(name,'=',val)
name = FLC
shares = 100
price = 99.09
- zip()函数也可以接收三个参数
a = [1,2,3]
b = ['a','b','c']
c = [10,20,30]
for i in zip(a,b,c):
print(i)
(1, 'a', 10)
(2, 'b', 20)
(3, 'c', 30)
- zip()会返回一个迭代器,如果想要将值存储在列表中,可以使用list()函数
zip(a,b)
<zip at 0x999df0>
list(zip(a,b))
[(1, 'a'), (2, 'b'), (3, 'c')]
4.12 不同集合上元素的迭代
- 问题:想要在多个对象上执行相同的操作,但是这些对象在不同的容器上
- 方案:使用itertools.chain()
- chain()函数接收一个可迭代对象列表作为输入,返回一个迭代器
from itertools import chain
a = [1,2,3]
b = ['a','b','c']
for x in chain(a,b):
print(x,end=' ')
1 2 3 a b c
- 也可以先将可迭代对象合并,但是效率很低,占用内存
for i in a+b:
print(i,end=' ')
1 2 3 a b c
4.13创建数据处理管道
- 问题:如何以数据管道的方式迭代处理数据
- 方案:使用生成器函数
- yield from语句,它将yield操作代理到父生成器上去。yield from it 简单的返回生成器it做产生的所有值
import os,fnmatch,gzip,bz2,re
def gen_find(filepat,top):
'''find all filenames in a directory tree that match a shell wildcard pattern'''
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
def gen_opener(filenames):
'''open a sequence of files one at a time producing a file object.
the file is closed immediately when proceeding to the next iteration
'''
for filename in filenames:
if filename.endswith('.gz'):
f = gzip.open(filename,'rt')
elif filename.endswith('.bz2'):
f = bz2.open(filename,'rt')
else:
f = open(filename,'rt')
yield f
f.close()
def gen_concatenate(iterators):
'''chain a sequence of iterator together into a single sequence'''
for it in iterators:
yield from it
def gen_grep(pattern,lines):
'''look for a regex pattern in a sequence of lines'''
pat = re.compile(pattern)
for line in lines:
if pat.search(line):
yield line
4.14展开嵌套的序列
- 问题:如何将一个嵌套多层的序列展开成一个单层列表
- 方案:编写一个包含yield from 语句的递归生成器
from collections import Iterable
def flatten(items,ignore_type = (str,bytes)):
for x in items:
'''isintance()用来检查某个元素是否是可以迭代的
isintance(x,ignor_type)用来将字符串和字节排除在可迭代对象之外,
防止将一个字符串拆分成单个字符
'''
if isinstance(x,Iterable) and not isinstance(x,ignore_type):
yield from flatten(x)
else:
yield x
items = [1,2,[3,4,[5,6],7],8]
for x in flatten(items):
print(x,end=' ')
1 2 3 4 5 6 7 8
items = ['Dave','Paula',['Thomans','Levis']]
for x in flatten(items):
print(x,end=' ')
Dave Paula Thomans Levis
- yield from 在你想在生成器中调用其它生成器作为子例程时很有用,如果你不想使用它,需要写额外的for循环
from collections import Iterable
def flatten(items,ignore_type = (str,bytes)):
for x in items:
if isinstance(x,Iterable) and not isinstance(x,ignore_type):
for i in flatten(x):
yield i
else:
yield x
4.15顺序迭代合并后的排序迭代对象
- 问题:有一系列排序序列,想将它们合并后得到一个排序序列并迭代遍历
- 方案:heapq.merge()
import heapq
a = [1,4,5,7]
b = [2,6,10,13]
for c in heapq.merge(a,b):
print(c,end=' ')
1 2 4 5 6 7 10 13
4.16 迭代器代替while无限循环
- 问题:使用迭代器代替while
- 方案:一种常见的IO操作程序如下
CHUNKSIZE = 8192
def reader(s):
while True:
data = s.recv(CHUNKSIZE)
if data == b'':
break
#process_data(data)
- 使用iter()代替
def reader2(s):
for chunk in iter(lambda:s.recv(CHUNKSIZE),b''):
pass
#process_data(data)
- iter() 函数一个鲜为人知的特性是他可以接收一个可选的callable对象和一个标记(结尾)值作为输入参数。当以这种方式使用的时候,它会创建一个迭代器,不断调用callable对象直到返回值和标记相等为止