bilibili排行榜爬取
众所周知,B站学习软件。哈哈哈哈,今天我们就爬取B站的排行榜。废话不多说了,直接开始了。
#分析:
我们看图一可以发现每个是视频的info都在li的标签里,我可以用xpath得到,在这里我想获得视频的封面,播放量,综合得分以及视频链接;除了封面,其它的都可以得到,后来我在另一个另一个链接中发现了,我在后面会讲到。
图一:
我们点开视频链接,进入视频播放页,F12一下,点击network,让视频播放,会发现有许多xhr文件不断刷新(如图二文件),它以m4s结尾
图二:
我们可推断视频是每段小段m4s的文件结合起来。我复制其中一个链接,打开后,如图三
图三:
这时我们该想另一件事,即使我们能获得这个文件,我们该怎么获取这样一个个链接,我找了好大一会,找不到,那我们就应该换一种思路,是不是有一个完整的视频链接,它会保存到什么地方,最后被我找到了,它其实隐藏在一开始的elements中,这是我们在里面搜索一下window,会发现图四:
图四:
这时我们可以打开页面源码,把进行查看,我第一眼感觉他是json文件,这里我们可以用正则获取,我们分析一下:
dic={
"code":0,"message":"0","ttl":1,"data":{
"from":"local","result":"suee","message":"","quality":80,"format":"flv","timelength":146787,"accept_format":"hdflv2,flv,flv720,flv480,mp4","accept_description":["高清 1080P+","高清 1080P","高清 720P","清晰 480P","流畅 360P"],"accept_quality":[112,80,64,32,16],"video_codecid":7,"seek_param":"start","seek_type":"offset","dash":{
"duration":147,"minBufferTime":1.5,"min_buffer_time":1.5,"video":[{
"id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30080.m4s?expires=1605283871&platform=pc&ssig=XlRvLfX0CDoVZxxwdIYxbA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":1288827,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640032","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1005","indexRange":"1006-1385"},"segment_base":{
"initialization":"0-1005","index_range":"1006-1385"},"codecid":7},{
"id":80,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-24.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=40061&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30077.m4s?expires=1605283871&platform=pc&ssig=XeJS13gzcoySCzuU_3lnzA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":777178,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1920,"height":1080,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1178","indexRange":"1179-1558"},"segment_base":{
"initialization":"0-1178","index_range":"1179-1558"},"codecid":12},{
"id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2163&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-14.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30064.m4s?expires=1605283871&platform=pc&ssig=9jIZlATVmCseR5eGS1ivfg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5159&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":937924,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.640028","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1003","indexRange":"1004-1383"},"segment_base":{
"initialization":"0-1003","index_range":"1004-1383"},"codecid":7},{
"id":64,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-03.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20113&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30066.m4s?expires=1605283871&platform=pc&ssig=xS3nNEHtDk7HFcVVGq6fZQ&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":567464,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":1280,"height":720,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{
"initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{
"id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30032.m4s?expires=1605283871&platform=pc&ssig=RftyvJ8DxVK9VwJKmmTzVg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":557917,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001F","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1007","indexRange":"1008-1387"},"segment_base":{
"initialization":"0-1007","index_range":"1008-1387"},"codecid":7},{
"id":32,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjhz-cmcc-v-17.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5162&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjwz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30033.m4s?expires=1605283871&platform=pc&ssig=-m8N-lidyREkwlIp0PLVjg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=5175&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":339786,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":852,"height":480,"frameRate":"16000/544","frame_rate":"16000/544","sar":"640:639","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1182","indexRange":"1183-1562"},"segment_base":{
"initialization":"0-1182","index_range":"1183-1562"},"codecid":12},{
"id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-06.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20116&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-18.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30011.m4s?expires=1605283871&platform=pc&ssig=cVpa1fZbZ72Fgow5rWBhUA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=11314&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":217071,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"hev1.1.6.L120.90","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1179","indexRange":"1180-1559"},"segment_base":{
"initialization":"0-1179","index_range":"1180-1559"},"codecid":12},{
"id":16,"baseUrl":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2165&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20115&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30016.m4s?expires=1605283871&platform=pc&ssig=eZ8L3vv-fwpq1BVHXwzNMA&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4059&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":353246,"mimeType":"video/mp4","mime_type":"video/mp4","codecs":"avc1.64001E","width":640,"height":360,"frameRate":"16000/544","frame_rate":"16000/544","sar":"1:1","startWithSap":1,"start_with_sap":1,"SegmentBase":{
"Initialization":"0-1028","indexRange":"1029-1408"},"segment_base":{
"initialization":"0-1028","index_range":"1029-1408"},"codecid":7}],"audio":[{
"id":30280,"baseUrl":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2162&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-02.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20112&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30280.m4s?expires=1605283871&platform=pc&ssig=ijWE5AKPMxysK6kbTxOurg&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4062&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{
"Initialization":"0-907","indexRange":"908-1299"},"segment_base":{
"initialization":"0-907","index_range":"908-1299"},"codecid":0},{
"id":30216,"baseUrl":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2164&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-04.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20114&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-11.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30216.m4s?expires=1605283871&platform=pc&ssig=3VYJK4sTVkNDvn3AMrni-Q&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4069&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":67328,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{
"Initialization":"0-932","indexRange":"933-1324"},"segment_base":{
"initialization":"0-932","index_range":"933-1324"},"codecid":0},{
"id":30232,"baseUrl":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","base_url":"http://cn-zjhz2-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=2161&mid=481314897&orderid=0,3&agrr=0&logo=80000000","backupUrl":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"backup_url":["http://cn-zjnb-cmcc-v-01.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=20111&mid=481314897&orderid=1,3&agrr=0&logo=40000000","http://cn-zjhz-cmcc-v-05.bilivideo.com/upgcxcode/22/07/254420722/254420722-1-30232.m4s?expires=1605283871&platform=pc&ssig=cGJi49KnW-oQE0EsnOpwdw&oi=1880138977&trid=ff98882c75fc400cb477f1d2889bf9f1u&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&cdnid=4063&mid=481314897&orderid=2,3&agrr=0&logo=40000000"],"bandwidth":117388,"mimeType":"audio/mp4","mime_type":"audio/mp4","codecs":"mp4a.40.2","width":0,"height":0,"frameRate":"","frame_rate":"","sar":"","startWithSap":0,"start_with_sap":0,"SegmentBase":{
"Initialization":"0-907","indexRange":"908-1299"},"segment_base":{
"initialization":"0-907","index_range":"908-1299"},"codecid":0}]},"support_formats":[{
"quality":112,"format":"hdflv2","new_description":"1080P 高码率","display_desc":"1080P","superscript":"高码率"},{
"quality":80,"format":"flv","new_description":"1080P 高清","display_desc":"1080P","superscript":""},{
"quality":64,"format":"flv720","new_description":"720P 高清","display_desc":"720P","superscript":""},{
"quality":32,"format":"flv480","new_description":"480P 清晰","display_desc":"480P","superscript":""},{
"quality":16,"format":"mp4","new_description":"360P 流畅","display_desc":"360P","superscript":""}]},"session":"b80375f9a61937c9ce93ee13909c1bca"}
for key,value in dic['data'].items():
print(key,':',value)
print('===================================')
for key,value in dic['data']['dash'].items():
print(key,':',value)
print('===================================')
for key,value in dic['data']['support_formats'][0].items():
print(key,':',value)
dic是我们得到json数据,经过我一成一成剥开,发现他的视频与音频是两个文件,那就是分开的,我们可以下载后合成。我们看下我分析的结果:
图五:
accept_description指的是视频画质,accept_quality指的是视频画质对应的id,这里我没有会员,所以最高获取高清 1080的画质视频,视频文件在video的baseUrl中,音频文件在audio的baseUrl。
同时我带着试试的想法吧图一红线的那一串字符复制,在视频链接的elements中搜寻,居然找到(如图七),我打开了链接就是原先封面,并且我在其它视频链接中试试,得到的都是视频封面,我们用正则就可以得到。
图七:
我们的的分析完成了,接下来上代码。
代码:
1:引入库
import re
from random import randint
import requests
from lxml import etree
from time import sleep
import json
import os
2:建立session,共享cookie
# 建立session
print('建立session')
session = requests.Session()
base_url = 'https://www.bilibili.com/'
base_headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'cookie': 自己的cookie,
'referer': 'https://www.google.com/',
}
session.get(url=base_url, headers=base_headers)
sleep(randint(3,5))
3:爬取视频排行榜:(在这里我感觉headers加上referer是非常重要的,referer也就是你上一级网页链接)
# 爬取排行榜视频:
print('爬取排行榜视频')
dic={
}
leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'
leaderboard_headers = {
'referer': leaderboard_url,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'cache-control': 'max-age=0',
}
response = session.get(url=leaderboard_url, headers=leaderboard_headers)
sleep(randint(3,5))
content = response.content
html = etree.HTML(content)
info_list = html.xpath('//ul[@class="rank-list"]/li')
for li in info_list:
name = li.xpath('div[2]/div[2]/a/text()')[0] #视频名字
href = 'https:'+li.xpath('div[2]/div[2]/a/@href')[0] #视频链接
score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0]+'综合得分' #综合得分
play_volume=li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip() #播放量
list=[href,score,play_volume]
dic[name]=list
# print(name,href,score,play_volume)
# print(dic)
在这里我把视频的name作为字典的key,而视频链接,综合得分,播放量放在列表里,list作为字典的value。
4:在这里我爬取时有时候session没法用,我就勇try一下,如果session可以,就不要except,不可以,我就勇request.get求求,不要忘了加入cookie。
我在进行爬取时,把视频链接与音频链接放入一个列表,再把这个列表放入前面的列表中
#得到音频链接
print('视频爬取')
video_headers={
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'cache-control': 'max-age=0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'referer':leaderboard_url,
}
num=0
for i in dic.keys():
video_url=dic[i][0]
#获取封面链接
try:
response=session.get(url=video_url,headers=video_headers)
except:
video_headers['cookie'] = 自己的cookie,
response=requests.get(url=video_url,headers=video_headers)
text = response.text
img_url=re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">',text).group(1)
dic[i].append(img_url) #照片链接添加到列表里
data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)
data = json.loads(data)
# print(data)
try:
time = data['data']['dash']['duration']
minute = int(time) // 60
second = int(time) % 60
#视频链接
video_url = data['data']['dash']['video'][0]['baseUrl']
#音频链接
audio_url = data['data']['dash']['audio'][0]['baseUrl']
list=[video_url,audio_url]
dic[i].append(list)
print(video_url)
print(audio_url)
print('视频时长{}分{}秒'.format(minute, second))
except KeyError:
time = data['data']['timelength'] // 1000
minute = int(time) // 60 # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。
second = int(time) % 60
video_url = data['data']['durl'][0]['url']
list = [video_url]
dic[i].append(list)
print('视频时长{}分{}秒'.format(minute, second))
5:视频音频下载
'origin': 'https://www.bilibili.com',
'referer': 'https://www.bilibili.com/',
都有这两个,然后我添加进去成功了
#下载视频与音频
print('下载')
headers={
'cookie':自己的cookie,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'origin': 'https://www.bilibili.com',
'referer': 'https://www.bilibili.com/',
}
path=r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)
bool=mkdir(path)
if bool==1:
video_path=path+'\_video.mp4'
audio_path=path+'\_audio.mp4'
save_path=path+'\{}.mp4'.format(num)
info_path=path+'\{}.text'.format(num)
img_path=path+'\{}.png'.format(num)
num += 1
print('{}视频开始爬取'.format(i))
with open(video_path, 'wb') as f: # 视频部分
response = requests.get(dic[i][-1][0], headers=headers)
print(response.status_code)
f.write(response.content)
print('{}视频爬取完成'.format(i))
print('{}音频开始爬取'.format(i))
with open(audio_path, 'wb') as f: # 音频部分
response = requests.get(dic[i][-1][-1], headers=headers)
f.write(response.content)
print('{}音频爬取完成'.format(i))
6:封面下载与info保存:
#封面下载
with open(img_path, 'wb') as f:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
url = 'http://i2.hdslb.com/bfs/archive/273ed274d5cf2556e162f8d1f7eef3b63bd2f31b.jpg'
response = requests.get(url=dic[i][3], headers=headers)
f.write(response.content)
#info保存
with open(info_path,'w') as f:
info=i+'\n'+dic[i][1]+'\n'+dic[i][2]
f.write(info)
7:视频合成
先要视频合成必须以管理员身份运行编辑器,我用的是pycharm,还有就是编辑器编码要变成’gbk’,不能’utf-8’
cmd=r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path,audio_path,save_path)
p = os.popen(cmd)
全部代码:
import re
from random import randint
import requests
from lxml import etree
from time import sleep
import json
import os
def get_link_and_img():
# 建立session
print('建立session')
session = requests.Session()
base_url = 'https://www.bilibili.com/'
base_headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'cookie': 自己的cookie,
'referer': 'https://www.google.com/',
}
session.get(url=base_url, headers=base_headers)
sleep(randint(3, 5))
# 爬取排行榜视频:
print('爬取排行榜视频')
dic = {
}
leaderboard_url = 'https://www.bilibili.com/v/popular/rank/all?spm_id_from=333.851.b_7072696d61727950616765546162.3'
leaderboard_headers = {
'referer': leaderboard_url,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'cache-control': 'max-age=0',
}
response = session.get(url=leaderboard_url, headers=leaderboard_headers)
sleep(randint(3, 5))
content = response.content
html = etree.HTML(content)
info_list = html.xpath('//ul[@class="rank-list"]/li')
for li in info_list:
name = li.xpath('div[2]/div[2]/a/text()')[0] # 视频名字
href = 'https:' + li.xpath('div[2]/div[2]/a/@href')[0] # 视频链接
score = li.xpath('div[2]/div[2]/div[2]/div/text()')[0] + '综合得分' # 综合得分
play_volume = li.xpath('div[2]/div[2]/div[1]/span[1]/text()')[0].strip() # 播放量
list = [href, score, play_volume]
dic[name] = list
# print(name,href,score,play_volume)
# print(dic)
# 视频爬取
print('视频爬取')
video_headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'cache-control': 'max-age=0',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'referer': leaderboard_url,
}
num = 0
for i in dic.keys():
video_url = dic[i][0]
# 获取封面链接
try:
response = session.get(url=video_url, headers=video_headers)
except:
video_headers['cookie'] = 自己的cookie
response = requests.get(url=video_url, headers=video_headers)
text = response.text
img_url = re.search(r'<meta data-vue-meta="true" itemprop="image" content="(.*?)">', text).group(1)
dic[i].append(img_url) # 照片链接添加到列表里
data = re.search(r'__playinfo__=(.*?)</script><script>', text).group(1)
data = json.loads(data)
# print(data)
try:
time = data['data']['dash']['duration']
minute = int(time) // 60
second = int(time) % 60
video_url = data['data']['dash']['video'][0]['baseUrl']
audio_url = data['data']['dash']['audio'][0]['baseUrl']
list = [video_url, audio_url]
dic[i].append(list)
# print(video_url)
# print(audio_url)
# print('视频时长{}分{}秒'.format(minute, second))
except KeyError:
time = data['data']['timelength'] // 1000
minute = int(time) // 60 # 有些视频的格式是不一样的,不用合并音频,视频啥的了,不过很少。
second = int(time) % 60
video_url = data['data']['durl'][0]['url']
list = [video_url]
dic[i].append(list)
# print('视频时长{}分{}秒'.format(minute, second))
# 下载视频与音频
print('下载视频音频')
headers = {
'cookie': 自己的cookie,
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
'origin': 'https://www.bilibili.com',
'referer': 'https://www.bilibili.com/',
}
path = r'C:\Users\jyj34\Desktop\bilibili\{}'.format(num)
bool = mkdir(path)
# print(bool)
# print(path)
if bool==1:
video_path = path + '\_video.mp4'
audio_path = path + '\_audio.mp4'
save_path = path + '\{}.mp4'.format(num)
info_path = path + '\{}.text'.format(num)
img_path = path + '\{}.png'.format(num)
print('{}视频开始爬取'.format(i))
with open(video_path, 'wb') as f: # 视频部分
response = requests.get(dic[i][-1][0], headers=headers)
print(response.status_code)
f.write(response.content)
print('{}视频爬取完成'.format(i))
print('{}音频开始爬取'.format(i))
with open(audio_path, 'wb') as f: # 音频部分
response = requests.get(dic[i][-1][-1], headers=headers)
f.write(response.content)
print('{}音频爬取完成'.format(i))
# 封面下载
with open(img_path, 'wb') as f:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
}
response = requests.get(url=dic[i][3], headers=headers)
f.write(response.content)
# info保存
with open(info_path, 'w') as f:
info = i + '\n' + dic[i][1] + '\n' + dic[i][2]
f.write(info)
# 音频视频合成
composite(video_path, audio_path, save_path)
sleep(randint(5, 8))
else:
print('{}已经被爬取'.format(i))
num = num + 1
def mkdir(path):
folder = os.path.exists(path)
if not folder: # 判断是否存在文件夹如果不存在则创建为文件夹
os.makedirs(path)
return 1
else:
return 0
def composite(video_path, audio_path, save_path):
cmd = r'ffmpeg -i {} -i {} -acodec copy -vcodec copy {}'.format(video_path, audio_path, save_path)
p = os.popen(cmd)
# print(p.read())
get_link_and_img()
这里面的下载视频与音频还有封面,以及合成视频音频可以再def一个函数,看起来比较好看,容易读。
这里我把字典的对应表示出来key:[href,sorce,play_volume,[video_url,audio_url]]。
另外可以见到我里面有sleep,为什么呢?因为我们是讲武德的。
好了,这一期爬虫就到处为止,如果你有不懂得。
下面是我微信公众号。可以关注一下