python每日一题:采用正则表达式,beautifulsoap,xpath爬取网站数据

1.采用beautifusoap获取网站信息:

import urllib. request
from bs4 import BeautifulSoup
from lxml import etree
html = 'http://www.baidu.com'
s=urllib.request.urlopen(html)
soap=BeautifulSoup(s,'lxml')
b=soap.div
sa=soap.find_all("div",id='u1')

soap1=BeautifulSoup(str(sa),'lxml')
for i in soap1.div:
    print(i.string)
sb=soap.map.area['href']
print(sb)
s=urllib.request.urlopen('http:'+sb)
print(s.read().decode())

  调试结果:获取百度网站的一些关键字:新闻、地图、视频等,并提取源图片的网站。

2.采用xpath获取网站信息:

猜你喜欢

转载自www.cnblogs.com/xuehaiwuya0000/p/10455549.html