scrapy 基于终端指令的持久化处理

爬取糗事百科首页,把标题和作者写入到本地文件

创建糗事百科爬虫
scrapy genspider qiushi https://www.qiushibaike.com/

qiushi.py代码

import scrapy


class QiushiSpider(scrapy.Spider):
    name = 'qiushi'
    # allowed_domains = ['www.web.com']
    start_urls = ['https://www.qiushibaike.com/']

    def parse(self, response):
        li_list = response.xpath('//*[@id="content"]/div/div[2]/div/ul/li')
        ls = []
        for li in li_list:
            title = li.xpath('./div/a/text()')[0].extract()
            author = li.xpath('./div/div/a/span/text()')[0].extract()

            data = {
                "作者": author,
                "标题": title
            }
            ls.append(data)
        # 返回的对象要求是可迭代对象
        return ls

持久化命令
scrapy crawl qiushi -o qiushi.josn
scrapy crawl qiushi -o qiushi.csv
scrapy crawl qiushi -o qiushi.xml

猜你喜欢

转载自www.cnblogs.com/bibicode/p/13384537.html