Scrapy之Images Pipeline - 代码天地

Scrapy之Images Pipeline

其他 2019-04-23 15:11:36 阅读次数: 0

items. py

import scrapy

class MyItem(scrapy.Item):

    # ... other item fields ...
    img_urls = scrapy.Field()
    img_paths = scrapy.Field()

pipelines. py

import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem

class ZhihuImagesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        for img_url in item['img_urls']:
            yield scrapy.Request(img_url)

    def item_completed(self, results, item, info):
        img_paths = [x['path'] for ok, x in results if ok]
        if not img_paths:
            raise DropItem("Item contains no images")
        item['img_paths'] = img_paths
        return item

注释

results返回一个元组list，典型值如下：

 [(True,
  {'checksum': '2b00042f7481c7b056c4b410d28f33cf',
   'path': 'full/0a79c461a4062ac383dc4fade7bc09f1384a3910.jpg',
   'url': 'http://www.example.com/files/product1.pdf'}),
 (False,
  Failure(...))]

setting. py

ITEM_PIPELINES = {'myProject.pipelines.MyImagesPipeline': 1}	#数字越低，优先级越高
IMAGES_STORE = 'D:\\path\\...'

猜你喜欢

转载自blog.csdn.net/masami269981/article/details/89453276

Scrapy之Images Pipeline

Scrapy Pipeline

Scrapy之pipeline&扩展

Scrapy笔记- Item Pipeline

笔记-scrapy-pipeline

【爬虫】Scrapy Item Pipeline

scrapy-Item Pipeline

Scrapy——Item Pipeline

scrapy items & pipeline

python scrapy 爬虫 pipeline

scrapy-pipeline的方法

scrapy Pipeline 练习

scrapy的pipeline（持久化）

Scrapy : Item Pipeline

初识 Scrapy - Item Pipeline

爬虫 - Scrapy Pipeline

scrapy爬虫之item pipeline保存数据

scrapy框架之item pipeline的使用

Scrapy学习篇（七）之Item Pipeline

scrapy框架之Pipeline管道类

Scrapy框架中的Pipeline组件

scrapy中pipeline的异步存储

Scrapy爬虫-pipeline.py

爬虫Scrapy的核心组件Pipeline

Scrapy用Pipeline写入MySQL

No module named 'scrapy.pipeline'

redis之pipeline使用

netty 二之pipeline

Spark之pipeline机制

redis之管道——pipeline

今日推荐

周排行

成为C++高手之宏与枚举

在CAD二次开发中使用进度条

Js插件ECharts，HighCharts学习网址整理

Celery提交任务出错(on windows.)

cephfs内核客户端性能追踪

thinkphp中PHPExcel用法

EntityFramework动态组合多排序字段

汇编语言（八）实验9 根据材料编程

安装ubuntu后必须做的事情（对我而言）

JS函数式编程

每日归档

更多

2024-10-22(0)

2024-10-21(0)

2024-10-20(0)

2024-10-19(0)

2024-10-18(0)

2024-10-17(0)

2024-10-16(0)

2024-10-15(0)

2024-10-14(0)

2024-10-13(0)