python每日一题：采用正则表达式，beautifulsoap，xpath爬取网站数据 - 代码天地

python每日一题：采用正则表达式，beautifulsoap，xpath爬取网站数据

其他 2019-03-01 12:28:04 阅读次数: 0

1.采用beautifusoap获取网站信息：

import urllib. request
from bs4 import BeautifulSoup
from lxml import etree
html = 'http://www.baidu.com'
s=urllib.request.urlopen(html)
soap=BeautifulSoup(s,'lxml')
b=soap.div
sa=soap.find_all("div",id='u1')

soap1=BeautifulSoup(str(sa),'lxml')
for i in soap1.div:
    print(i.string)
sb=soap.map.area['href']
print(sb)
s=urllib.request.urlopen('http:'+sb)
print(s.read().decode())

　　调试结果：获取百度网站的一些关键字：新闻、地图、视频等，并提取源图片的网站。

2.采用xpath获取网站信息：

猜你喜欢

转载自www.cnblogs.com/xuehaiwuya0000/p/10455549.html

python每日一题：采用正则表达式，beautifulsoap，xpath爬取网站数据

爬取实例-Python3.6，Xpath，BeautifulSoup4, 正则表达式

re 正则表达式爬取网站标题

leetcode 每日一题 10. 正则表达式匹配

【LeetCode每日一题】2020.6.20 10. 正则表达式匹配

python正则表达式爬取猫眼电影

python爬取准备二正则表达式

Python 利用BeautifulSoup和正则表达式来爬取旅游网数据

Python爬虫学习（一）使用Requests和正则表达式爬取简单网页

利用正则表达式进行爬取数据以及正则表达式的一些使用方法

多进程，Request+正则表达式爬取榜单类网站

利用正则表达式处理爬取的今日头条内容数据（Python爬虫数据清洗）

python爬取天气网的全国空气质量指数排行榜（使用正则表达式和Xpath方法

Requests + 正则表达式爬取猫眼电影

正则表达式爬取猫眼电影

requests正则表达式爬取猫眼电影

正则表达式爬取猫眼电影100

正则表达式爬取网页实战

使用正则表达式爬取内容

初识python 之爬虫：使用正则表达式爬取“糗事百科 - 文字版”网页数据初识python 之爬虫：使用正则表达式爬取”古诗文“网页数据

Python 爬虫爬取单个基因表格数据的生物学功能（urllib+正则表达式）：

Python 第三讲——正则表达式爬取糗事百科数据

初识python 之爬虫：使用正则表达式爬取”古诗文“网页数据

爬取豆瓣电影前250，借此熟悉python的request，数据入库，正则表达式

Python爬虫学习第一天--利用正则表达式爬取图片

Python爬虫数据提取方式——正则表达式 re （附加实例：爬取csdn首页内容）附：表达式全集（正则表达式手册）

Python3-正则表达式~爬取猫眼电影应用

python 正则表达式爬取新闻详情页面

python实战笔记之（1）：Requests+正则表达式爬取猫眼电影

python 爬虫正则表达式爬取猫眼电影top100榜

今日推荐

周排行

(BIND最佳实践)Linux运维最佳实践

makefile ifeq之坑: 1. syntax error near unexpected token 2. *** missing separator. Stop.

easyui datagrid操作栏内置图片按钮

SQLyog连接MySQL时出现的2058错误解决方法

linux音频开发

hashcode方法简析

SpringBoot中使用Transaction注解遇到的坑

逆战-CSS中子元素在父元素中的4种水平垂直居中方法

Expression.Blend.4 Chapter 图片和视频的使用

springMVC返回void值

每日归档

更多

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)

2024-09-08(0)