scrapy框架的异步插入数据库mysql

  • 话说scrapy 有这非常优越的爬取速度,但是插入数据我们用同步的话,感觉老阻塞在哪里,很影响爬取的效率。scrapy 有自身的异步处理数据模块adbapi.
  • from twisted.enterprise import adbapi.
  • 异步插入数据的Pipeline .
class BookTwistedPipeline(object):

    def __init__(self):
        params = {
            'host': 'localhost',
            'port': 3306,
            'user': 'root',
            'password': 'mysql',
            'charset': 'utf8',
            'database':'book',
            'cursorclass': cursors.DictCursor
        }
        self.dbpool = adbapi.ConnectionPool("pymysql", **params) # 连接配置
        self.sql = """insert into book_detail values(0,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""

    def process_item(self, item, spider):
        defer = self.dbpool.runInteraction(self.insert_sql, item) # scrapy 带的异步连接池
        defer.addErrback(self.handle_err, item, spider) # 错误处理的回调

    def insert_sql(self,cursor,item):
        cursor.execute(self.sql,(item['b_cate'], item['book_name'], item['b_href'], item['s_href'],
                               item['s_cate'], item['book_href'], item['book_sku'], item['venderid'],
                               item['author'], item['prices']))

    def handle_err(self, error, item, spider):
        if error:
            print("INFO:%s %s"%(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),error))
发布了127 篇原创文章 · 获赞 25 · 访问量 3万+

猜你喜欢

转载自blog.csdn.net/weixin_44224529/article/details/103917578