python 正则表达式爬取新闻详情页面 - 代码天地

python 正则表达式爬取新闻详情页面

其他 2018-05-31 12:02:27 阅读次数: 0

之前用xpath来匹配内容页面，匹配的嘟是纯文字，遇到图片还需要特殊处理，有时候需要采集新闻，带上原来的部分样式可以更好的二次处理。

import requests
import re

url = 'https://www.qiushibaike.com/article/119998177'
# url ='http://www.cnyifeng.net/news/show-469.html'

headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"}

response = requests.get(url,headers=headers)

response.encoding = 'utf-8'

html_str = response.text

pattern = re.compile(r'<div class="content">([\s\S]*?)</div>') #匹配指定标签文本

content_str = pattern.findall(html_str)

print(content_str[0])
# print(str(content_str[0]).strip().replace('\n',''))

# print(response.content.decode('utf-8'))

猜你喜欢

转载自www.cnblogs.com/zqrios/p/9115869.html

python 正则表达式爬取新闻详情页面

Python +正则表达式爬取豆瓣页面邮箱

Selenium+python --使用正则表达式爬取页面的URL链接

python正则表达式爬取猫眼电影

python爬取准备二正则表达式

Python3-正则表达式~爬取猫眼电影应用

python实战笔记之（1）：Requests+正则表达式爬取猫眼电影

python 爬虫正则表达式爬取猫眼电影top100榜

【Python】Requests+正则表达式爬取猫眼电影TOP100

practice之Python爬取今日头条图片（正则表达式）

[python] 常用正则表达式爬取网页信息及分析HTML标签总结

Python 利用BeautifulSoup和正则表达式来爬取旅游网数据

Python爬虫学习（一）使用Requests和正则表达式爬取简单网页

自学python爬虫（四）Requests+正则表达式爬取猫眼电影

Python爬虫-利用正则表达式爬取猫眼电影

python-对豆瓣的top250的爬取(利用正则表达式)

爬取实例-Python3.6，Xpath，BeautifulSoup4, 正则表达式

Python网络爬虫（四）re正则表达式之爬取CSDN博客

Python爬虫：正则表达式爬取校花网

Python爬虫：正则表达式爬取猫眼电影

python-Requests + 正则表达式爬取猫眼电影

python爬虫—爬取英文名以及正则表达式的介绍

python正则表达式——爬取网络小说实例

python爬虫之正则表达式爬取猫眼前100的电影（七）

使用python3的正则表达式爬取图片链接

Python 正则表达式之爬取古诗文名句

python正则表达式爬取链家租房信息

python爬虫学习（八）正则表达式批量爬取妹子图片

python爬取淘宝商品信息以及正则表达式

python爬虫正则表达式爬妹子图

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)