1.采用beautifusoap获取网站信息:
import urllib. request from bs4 import BeautifulSoup from lxml import etree html = 'http://www.baidu.com' s=urllib.request.urlopen(html) soap=BeautifulSoup(s,'lxml') b=soap.div sa=soap.find_all("div",id='u1') soap1=BeautifulSoup(str(sa),'lxml') for i in soap1.div: print(i.string) sb=soap.map.area['href'] print(sb) s=urllib.request.urlopen('http:'+sb) print(s.read().decode())
调试结果:获取百度网站的一些关键字:新闻、地图、视频等,并提取源图片的网站。
2.采用xpath获取网站信息: