版权声明:本文为博主原创文章,未经博主允许不得转载。如有问题,欢迎指正。 https://blog.csdn.net/bibi1003/article/details/87720621
目标网站:http://beijing.chineseoffice.com.cn/Template/office_complete.html
当查看网页源码时,没有各楼信息,实际写在JS里,包括翻页功能。
Chrome的developer tool查看网络包信息:
上代码:
url = "http://beijing.chineseoffice.com.cn/Building/GetbuildingList"
for i in range(10): #总页数可以从翻页工具条中获得
data = "page=%s" % str(i)
listPage = requests.post(url,data=data) #实现翻页
#以下非本文内容,用json解析获得的页面,取得build的id,拼成详细页link,访问详细页
page = requests.get("http://beijing.chineseoffice.com.cn/Building/GetbuildingList")
detail_page_link = "http://beijing.chineseoffice.com.cn/Template/office_details.html"
page_dic = str(listPage.content.decode())
for build in listPage.json():
print(build['id'], build['officeName'])
build_link = detail_page_link + "?id=" + build['id']
print(build_link)