(笔记)简单爬虫爬取公交线路

(笔记)简单爬虫爬取公交线路

学校数学建模选拔B题,发愁没有公交线路数据,于是百度了下,居然发现沈阳市公交站点这么多,所以学习了python爬虫,把它爬了下来哈哈哈。做一个笔记
爬取的URL是”http://shenyang.8684.cn/line1“,沈阳市内公交的站点。没有复杂的逻辑,网站也没有反爬,所以采用request和etree。

import  requests
from lxml import etree

lineInfo = list()
url = "http://shenyang.8684.cn/line1"

html = requests.get(url).text
Selector = etree.HTML(html)
nodes = Selector.xpath('//*[@id="con_site_1"]/a')

with open("target.txt","w") as f:
    for n in nodes:
        print(n.xpath('text()')[0])
        # print(n.xpath('@href')[0])
        urlNext = "http://shenyang.8684.cn/{0}".format(n.xpath('@href')[0])
        htmlNext = requests.get(urlNext).text
        SelectorNext = etree.HTML(htmlNext)
        stations = SelectorNext.xpath('//*[@id="bus_line"]/div[5]/div/div/a/text()')
        print(stations)
        lineInfo.append([n.xpath('text()')[0],stations])
    for i in lineInfo:
        str = '{0}'.format(i[0])
        for j in i[1]:
            str = str+" -->"+j;
        f.write(str+'\n\n');
        print(str)
f.close()

哈哈,就先这样,以后用到再学。


几个小时后的更新
发现爬下来对数学建模也没有什么帮助,好气。

猜你喜欢

转载自blog.csdn.net/ishandsomedog/article/details/80552655