1. 采集网站:妹子图,点击直达
采集内容:图片
网站如图,(自己去网站查看),太...(主要是怕过审不了),基本类似
2.采集思路:
如下图,翻页,图片链接都可以直接获取到,不涉及反爬,不详细分解,直接上code
3.整体代码:
# -*- coding: UTF-8 -*-
'''
@Author :Jason
抓取妹子图,记得创建文件夹 images
'''
import requests
from bs4 import BeautifulSoup
def getMeizituImages():
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"referer":"https://www.mzitu.com/",
}
total_page = int(input("请输入要爬取的页数:"))
for page in range(total_page+1,total_page+2):
url = "https://www.mzitu.com/page/{}/".format(str(page))
res = requests.get(url,headers=headers)
soup = BeautifulSoup(res.text, 'lxml')
imaUrl = soup.find_all(name="img", attrs={"class": "lazy"})
for urlSrc in imaUrl:
url = urlSrc["data-original"]
title = url.split("/")[-1:]
try:
response = requests.get(url, headers=headers)
filename = title[0]
response.encoding = "utf-8"
image = response.content
with open("./images/" + filename, "wb")as f:
f.write(image)
# print("%s" % filename, "下载成功")
except:
print("图片{}保存失败".format(title[0]))
print("第{}页保存成功".format(total_page))
if __name__ == "__main__":
getMeizituImages()
4.最终采集效果:
不放了,无法过审,自己运行代码看看效果
扫描二维码关注公众号,回复:
8615021 查看本文章