导包及数据库
import requests
from lxml import etree
根据需求寻找喜马拉雅网页
base_url = 'https://www.ximalaya.com/lishi/4164479/15022309'
track_id = base_url.split('/')[-1]
url = 'https://www.ximalaya.com/revision/play/tracks?trackIds=' + str(track_id)
由于网页有反爬措施,在访问网页请求数据时
需要加请求头(headers)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'
}
requests的get方法获取路由网址的页面(指定网址,并携带请求头)
response = requests.get(url, headers=headers)
requests存储的页面是json格式,我们要将json格式转为字典格式进去取数据
json_dict = response.json()
查找并匹配需要的数据(键值对的形式获取)
src_str = json_dict['data']['tracksForAudioPlay'][0]['src']
trackName = json_dict['data']['tracksForAudioPlay'][0]['trackName']
引入urllib的request方法
from urllib import request
使用urllib中的urlretrieve方法将我们远程匹配到的数据下载到本地
request.urlretrieve(src_str, trackName+'.m4a')