实例练习:正则表达式爬取百度贴吧照片

  代码出自小甲鱼,复盘,省略了图片的下载部分

  正则真的太好用了,不过关键在你想不想的到最高效的正则表达式

 1 #!/usr/bin/env python
 2 # -*- coding: utf-8 -*-
 3 import urllib.request
 4 import re
 5 
 6 url = "https://tieba.baidu.com/p/6512141636"
 7 
 8 def web(url):
 9     response = urllib.request.urlopen(url)
10     html = response.read().decode('UTF-8','ignore')
11     test = r'<img class="BDE_Image" src="([^"]+\.jpg)"'
12     out = re.findall(test,html)
13     print(out)
14 web(url)

猜你喜欢

转载自www.cnblogs.com/vhhi/p/12363937.html