爬取结构:
res = requests.get(url,headers = headers)
selector = etree.HTML(res.text)
#抓大标签
url_infos = selector.xpath('xpath路径')
.xpath('td/div/a/@title')[0]正确
td前面的全部删除掉
.xpath('string(.)').strip():
爬取结构:
res = requests.get(url,headers = headers)
selector = etree.HTML(res.text)
#抓大标签
url_infos = selector.xpath('xpath路径')
.xpath('td/div/a/@title')[0]正确
td前面的全部删除掉
.xpath('string(.)').strip():