爬取糗事百科首页,把标题和作者写入到本地文件
创建糗事百科爬虫
scrapy genspider qiushi https://www.qiushibaike.com/
qiushi.py代码
import scrapy
class QiushiSpider(scrapy.Spider):
name = 'qiushi'
# allowed_domains = ['www.web.com']
start_urls = ['https://www.qiushibaike.com/']
def parse(self, response):
li_list = response.xpath('//*[@id="content"]/div/div[2]/div/ul/li')
ls = []
for li in li_list:
title = li.xpath('./div/a/text()')[0].extract()
author = li.xpath('./div/div/a/span/text()')[0].extract()
data = {
"作者": author,
"标题": title
}
ls.append(data)
# 返回的对象要求是可迭代对象
return ls
持久化命令
scrapy crawl qiushi -o qiushi.josn
scrapy crawl qiushi -o qiushi.csv
scrapy crawl qiushi -o qiushi.xml