Python爬虫--喜马拉雅音频爬取 - 代码天地

Python爬虫--喜马拉雅音频爬取

其他 2018-08-07 12:16:18 阅读次数: 0

爬取喜马拉雅三国中的前十章音频：

 
   #导入requests模块 
  
   import requests 
  
   #导入正则表达式  
  
   import re 
  
   #解决反爬问题，导入UA 
  
    header = { 
   'User-Agent': 
   'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:57.0) Gecko/20100101 Firefox/57.0'} 
  
   #网页源代码中获取的前十章ID 
  
    sound_ids = ( 
   64686514, 
   64689648, 
   64695831, 
   64695832, 
   3218935, 
   3822581, 
   3419626, 
   3513844, 
   3593277, 
   3773655) 
  
   for s 
   in 
   range( 
   0, 
   10): 
  
   for i 
   in sound_ids: 
  
   # 每个音频的URL 
  
    url = 
   'http://www.ximalaya.com/tracks/'+ 
   str(sound_ids[s])+ 
   '.json' 
  
   #网页源代码 
  
    html = requests.get(url, 
   headers=header) 
  
   #打印网页源代码  
  
   # print(html.text)  
  
   def 
   get_find_url(): 
  
   #正则匹配ID和对应的URL 
  
    reg = 
   '"id":(.*?),"play_path_64":"(.*?)"' 
  
   #最终的音频URL数列 
  
    sound_url = re.findall(reg,html.text) 
  
   #打印音频URL数列 
  
   # print(sound_url) 
  
   return sound_url 
  
   #ID和音频URL单独取出来 
  
   for 
   id,url_finall 
   in get_find_url(): 
  
   #打印最终音频URL 
  
   #print('第',s+1,'节:',url1) 
  
   #获取音频详细内容 
  
    m4a = requests.get(url_finall) 
  
   #取音频最后4位数，即就是.m4a作为后缀名 
  
    m4a_name = url_finall[- 
   4:] 
  
   print( 
   '<正在下载第',s+ 
   1, 
   '节> ',url_finall) 
  
   #音频内容存储到本地 
  
   with 
   open( 
   '第'+ 
   str(s+ 
   1)+ 
   '节'+m4a_name, 
   'wb') 
   as f: 
  
    f.write(m4a.content)

猜你喜欢

转载自blog.csdn.net/Botree_chan/article/details/79513444

Python爬虫--喜马拉雅音频爬取

Python爬虫|爬取喜马拉雅音频

Python 爬取喜马拉雅音频

Python_爬虫_喜马拉雅音频

[python爬虫]多进程爬取喜马拉雅音乐

[python爬虫]喜马拉雅音乐

喜马拉雅音频爬取（仅供参考学习）

practice之Python爬取喜马拉雅的音频

Python---喜马拉雅fm的音频爬取

Python爬虫--喜马拉雅三国音频爬取

Python采集喜马拉雅音频数据详解

喜马拉雅音频下载

python 爬取喜马拉雅

【python爬虫】对喜马拉雅上一个专辑的音频进行爬取并保存到本地

喜马拉雅全站音频爬取

node.js 实现爬虫批量下载喜马拉雅音频

如何用Python爬取喜马拉雅全网音频文件

Python实例---爬取喜马拉雅全网音频文件

教你用python爬取喜马拉雅FM音频，干货分享~

Python中使用requests和parsel爬取喜马拉雅电台音频

喜马拉雅音频下载V1.1的功能

喜马拉雅音频下载V1.1

网页端下载喜马拉雅音频

喜马拉雅音频批量下载源码

如何利用喜马拉雅音频暴力吸粉？

网易云音乐信息爬取（存储为 csv文件）&喜马拉雅音乐爬取

喜马拉雅爬取

Python采集喜马拉雅音频，想收费那是不可能的，还包含视频教程！

【Python3 爬虫学习笔记】爬取喜马拉雅《宝宝巴士-奇妙三字经》

喜马拉雅音频下载+x2m文件转换

今日推荐

周排行

四大线程池详解

如何高效使用Vim

Mogodb的常用操作总结

Spyder默认页面布局调整

SAR日志分析

OAuth是一个关于授权（authorization）的开放网络标准，在全世界得到广泛应用，目前的版本是2.0版。本文对OAuth 2.0的设计思路和运行流程，做一个简明通俗的解释，主要参考材料为R

WebService中注解开发，CXF，Spring整合，Rest风格

2019考研英语一 Text1分析

windows下安装docker详细步骤

CentOS 7/6系统升级内核版本到5.2.2

每日归档

更多

2024-08-05(0)

2024-08-04(0)

2024-08-03(0)

2024-08-02(0)

2024-08-01(0)

2024-07-31(0)

2024-07-30(0)

2024-07-29(0)

2024-07-28(0)

2024-07-27(0)