使用python爬虫爬取卷皮网背包信息实例 - 代码天地

使用python爬虫爬取卷皮网背包信息实例

其他 2020-02-19 21:50:51 阅读次数: 0

使用requests和BeautifulSoup实现对卷皮网背包名称与价格的爬取

链接:www.juanpi.com

代码:

import requests
import re
from bs4 import BeautifulSoup

#从网络上获取背包网页内容
def getHtmlText(url):
    try:
        r =requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return "123"

#提取网页内容中信息到合适的数据结构
def fillUnivList(html):
    soup = BeautifulSoup(html,"html.parser")
    divs = soup.find_all('div')
    spans = soup.find_all('span')
    for i in divs :
        if 'list-good buy' in str(i):
            tit = i.find_all('h3')[0].find_all('a')[0].string
            spans = i.find_all('span')
            if 'price-current' in str(spans[0]):
                print('商品名称: ' + tit)
                print('价格: ' + str(spans[0])[38:-7])

#主函数
def main():
    goods='书包'
    depth = 2
    url = 'http://www.juanpi.com/search?keywords=' + goods
    for i in range(1,depth+1):
        print('第' + str(i) + '页: ------------------------------------------------')
        html = getHtmlText(url)
        fillUnivList(html)
        url = 'http://www.juanpi.com/search/' + str(i+1) +'?keywords=' + goods

main()

本文为学习北京理工大学爬虫mooc跟着敲得实例代码.附上链接:https://www.bilibili.com/video/av9784617?from=search&seid=17441199644632730564

猜你喜欢

转载自www.cnblogs.com/yue1234/p/12333318.html

使用python爬虫爬取卷皮网背包信息实例

一个简单Python爬虫实例（爬取的是前程无忧网的部分招聘信息）

[Python爬虫]爬虫实例:在线爬取当当网畅销书Top500的图书信息

[Python爬虫]爬虫实例:离线爬取当当网畅销书Top500的图书信息

Python爬虫爬取煎蛋网图片代码实例

Python 爬虫爬取安智网应用信息

python爬虫— 拉勾网职位信息爬取

python爬虫爬取淘宝网商品信息

简单python爬虫爬取拉勾网职位信息

python爬虫之爬取《贵州农经网》信息

python爬虫练习爬取美团网酒店信息

python爬虫-selenium爬取链家网房源信息

python爬虫—使用bs4爬取链家网的房源信息

【python爬虫实例】爬取豆瓣图书及信息

Python爬虫实例：爬取B站《工作细胞》短评——异步加载信息的爬取

爬虫---爬取拉钩信息网

使用Python原生爬虫爬取博客文章的简单信息

Python使用request爬取拉钩网信息

#python学习笔记#使用python爬取拉勾网职位信息（二）：爬取数据

python爬虫：爬取拉勾网数据

Python拉勾网爬虫-翻页爬取

python爬虫爬取诗词名句网

Python爬虫：爬取抽屉网

Python 爬虫爬取煎蛋网图片

python网络爬虫爬取房价信息

python网络爬虫，爬取图片信息

python爬虫的图片信息爬取

python爬虫，爬取豆瓣电影信息

python 爬虫 booking爬取酒店信息

Python爬虫：爬取网站电影信息

今日推荐

周排行

AIZU 2224 Save your cats(并查集)

HTTP响应头状态码详解

Python socket编程（2）

MaxCompute Studio使用心得系列7—作业对比

Supervisor安装使用

LeetCode 164. Maximum Gap

mysql面试题: 一张表里面有ID自增主键，当insert了17条记录之后，删除了第15,16,17条记录，再把mysql重启，再insert一条记录，这条记录的ID是18还是15

nutch1.2 DeleteDuplicates IndexMerger 详解

OC - @property与setter,getter方法

SpringBoot @Transactional的rollbackFor属性

每日归档

更多

2024-09-19(0)

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)