爬取xx百科首页数据

其他 2018-07-15 21:51:35 阅读次数: 0

#爬取糗事百科首页数据
import requests
from lxml import etree

def load_page(url):
    headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"}
    html = requests.get(url,headers = headers).text.replace('\n','')
    deal_data(html)

def deal_data(html):
    data_list = etree.HTML(html).xpath("//div[contains(@id,'qiushi_tag_')]")
    
    for data in data_list:
        username = data.xpath("./div/a/h2/text()")
        content = data.xpath(".//div[@class='content']/span/text()")[0]
        img = data.xpath(".//div[@class='thumb']//img/@src")
        zan = data.xpath(".//i/text()")[0]
        comment = data.xpath(".//i//text()")[1]
        res_data = {"username" : username, "content" : content, "img" : img, "zan" : zan, "comment" : comment}
        print(res_data)

def main():
    page_num = input("请输入要爬取的页码:")
    url = "https://www.qiushibaike.com/8hr/page/%s/"%page_num
    load_page(url)

if __name__ == "__main__":
    main()

猜你喜欢

转载自blog.csdn.net/sdzhr/article/details/80962981

爬取xx百科首页数据

BeautifulSoup爬取博客园首页摘要、糗事百科首页段子

python笔记之利用BeautifulSoup爬取糗事百科首页段子

python笔记之利用scrapy框架爬取糗事百科首页段子

爬取糗事百科

爬取糗事百科练习

爬取维基百科

Python爬取糗事百科

爬取糗事百科段子

爬取糗事百科案例

爬取糗事百科的页面

糗事百科段子爬取

糗事百科爬取

初识python 之爬虫：使用正则表达式爬取“糗事百科 - 文字版”网页数据初识python 之爬虫：使用正则表达式爬取”古诗文“网页数据

node爬取cnode首页数据

Python爬取百度百科1000个页面的数据

百度百科全站爬取教程

【scrapy爬虫】结合正则表达式爬取糗事百科段子首页步骤详解

学的太慢，Py2爬取糗事百科，json数据格式练习

Python爬虫框架Scrapy之爬取糗事百科大量段子数据

数据解析-正则表达式-爬取糗图百科上的图片

多线程爬取糗事百科

基于python3 爬取糗事百科

python爬虫（二）爬取糗事百科

Python 爬取糗事百科段子

python爬取糗事百科段子

【多线程待解决】爬取糗事百科

利用Python爬取糗事百科段子信息

爬虫实战（二）：爬取糗事百科段子

Python爬取糗事百科-多进程方法

今日推荐

周排行

深度学习------Lingvo框架下的加速通道GPipe

webjars管理静态资源

C专家编程_2.2

mysql 源码安装

json文件操作

123231432

注解的实现

Spring MVC 控制器

《人月神话》读后感二

C#使用HttpWebRequest和HttpWebResponse上传文件示例

每日归档

2024-09-08(0)

2024-09-07(0)

2024-09-06(0)

2024-09-05(0)

2024-09-04(0)

2024-09-03(0)

2024-09-02(0)

2024-09-01(0)

2024-08-31(0)

2024-08-30(0)