版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/ucsheep/article/details/89419343
QQ群:812930741,获取可运行程序
完整代码在本文最后
采集某用户所有的无水印视频得以实现,依赖于两个问题的解决
- 如何根据用户id获取到该用户的所有视频信息(需要包含播放地址)
- 如何获得一个用户的id
1.根据用户id获取该用户所有视频信息
第一次请求
curl
-H 'Host: api-a.huoshan.com'
-H 'Cookie: xxxxxxxxxxxxxx"'
-H 'X-SS-REQ-TICKET: xxxxxxxxxxxx'
-H 'sdk-version: 1'
-H 'X-SS-TC: 0'
-H 'User-Agent: xxxxxxxxxxx'
-H 'X-Pods: '
--compressed 'https://api-a.huoshan.com/hotsoon/user/109311764519/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=1555726295'
请求地址为
https://api-a.huoshan.com/hotsoon/user/109311764519/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=1555726295
重要参数
参数名 | 示例 | 含义 |
---|---|---|
url | user/109311764519/items/ | 这个路径信息中包含用户id,109311764519 |
min_time | 0 | 第一次请求 |
请求响应如下
参数名 | 含义 |
---|---|
data | 视频信息,20个视频 |
extra | 分页信息 |
has_more | 还有没有下一页 |
total | 总数 |
max_time | 请求下一页的参数 |
我们看一下第二次请求
https://api-a.huoshan.com/hotsoon/user/109311764519/items/?
max_time=1555157100000&offset=20&count=20
&req_from=feed_loadmore&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726323246&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=1555726323
我们看到,关键参数和第一次请求的关联
参数名 | 示例 | 含义 |
---|---|---|
url | user/109311764519/items/ | 没有变化 |
max_time | 1555157100000 | 这里就不是min_time了,这个值也是上一次请求返回的 |
你已经发现,只要has_more有值,就一直去请求下一页数据,直到拿完,这样就拿到该用户所有的视频了。
python 代码如下:
def get_user_videos(self, user_id):
url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
int(time.time()))
response = requests.get(url, headers=http_headers, timeout=10)
data = json.loads(response.text)['data']
extra = json.loads(response.text)['extra']
self._join_download_queue(data, user_id)
video_list = data
while (extra['has_more']):
try:
url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?max_time=" + str(extra[
'max_time']) + "&offset=20&count=20&req_from=feed_loadmore&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726323246&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
int(time.time()))
response = requests.get(url, headers=http_headers, timeout=10)
data = json.loads(response.text)['data']
extra = json.loads(response.text)['extra']
self._join_download_queue(data, user_id)
video_list = video_list + data
except:
pass
return len(video_list)
2.每个视频的播放地址
首先,看一下json
{
"data": {
"allow_comment": true,
"allow_dislike": true,
"allow_share": true,
"at_users": [],
"author": {
"allow_be_located": true,
"allow_find_by_contacts": true,
"allow_others_download_video": true,
"allow_others_download_when_sharing_video": true,
"allow_share_show_profile": true,
"allow_show_in_gossip": true,
"allow_show_my_action": true,
"allow_strange_comment": true,
"allow_unfollower_comment": true,
"anchor_level": {
"experience": 403,
"highest_experience_this_level": 460,
"level": 10,
"lowest_experience_this_level": 311,
"profile_dialog_bg": {
"uri": "hotsoon-resource/anchor_level_1.1_3x.png",
"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.1_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.1_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.1_3x.png"]
},
"profile_dialog_bg_back": {
"uri": "hotsoon-resource/anchor_level_1.2_3x.png",
"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.2_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.2_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1.2_3x.png"]
},
"small_icon": {
"uri": "hotsoon-resource/anchor_level_small_10_3x.png",
"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_small_10_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_small_10_3x.png", "http://p9-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_small_10_3x.png"]
},
"stage_level": {
"uri": "hotsoon-resource/anchor_level_1_3x.png",
"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/anchor_level_1_3x.png"]
},
"task_decrease_experience": 0,
"task_end_time": 1548691140,
"task_start_experience": 0,
"task_start_time": 1546145769,
"task_target_experience": 0
},
"avatar_jpg": {
"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
"url_list": ["http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.jpg", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.jpg", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.jpg"]
},
"avatar_large": {
"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
"url_list": ["http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~1080x1080.webp", "http://p9-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~1080x1080.webp", "http://p9-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~1080x1080.webp"]
},
"avatar_medium": {
"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
"url_list": ["http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~720x720.webp", "http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~720x720.webp", "http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~720x720.webp"]
},
"avatar_thumb": {
"uri": "hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479",
"url_list": ["http://p3-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.webp", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.webp", "http://p1-hs.bytecdn.cn/img/hotsoon-avatar/b0fd63b4cce193a6322feb8c96ba7cff793e228361b4c2b0e4f9cd4e44fdd479~100x100.webp"]
},
"bg_img_url": "",
"birthday": 0,
"birthday_description": "90后",
"birthday_valid": false,
"block_status": 0,
"city": "唐山",
"comment_restrict": 1,
"constellation": "",
"disable_ichat": 0,
"enable_ichat_img": 1,
"encrypted_id": "MS4wLjABAAAARp_KVU2BZK4BOo5xHhk0u5R-6vT7sQSf0teStWy8yjk",
"exp": 0,
"fan_ticket_count": 27583,
"fold_stranger_chat": false,
"follow_status": 0,
"gender": 1,
"hotsoon_verified": false,
"hotsoon_verified_reason": "",
"ichat_restrict_type": 1,
"id": 109311764519,
"id_str": "109311764519",
"income_share_percent": 0,
"is_follower": false,
"is_following": false,
"level": 1,
"need_profile_guide": false,
"nickname": "唐山赵鹏",
"pay_grade": {
"diamond_icon": {
"uri": "mosaic-legacy/12400003aba3dd42e213",
"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/12400003aba3dd42e213", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/12400003aba3dd42e213", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/12400003aba3dd42e213"]
},
"grade_banner": "28级可开启豪华入场",
"grade_describe": "距升级还需消费35钻",
"grade_icon_list": [{
"icon": {
"uri": "mosaic-legacy/3b65000678eac77af1d9",
"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9"]
},
"icon_diamond": 100,
"level": 6,
"level_str": "Lv.6"
}, {
"icon": {
"uri": "mosaic-legacy/3b65000678eac77af1d9",
"url_list": ["http://p9-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b65000678eac77af1d9"]
},
"icon_diamond": 200,
"level": 7,
"level_str": "Lv.7"
}, {
"icon": {
"uri": "mosaic-legacy/3b620006b1e388185513",
"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/3b620006b1e388185513", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b620006b1e388185513", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/3b620006b1e388185513"]
},
"icon_diamond": 300,
"level": 8,
"level_str": "Lv.8"
}],
"icon": {
"uri": "mosaic-legacy/30eb0000a101d40eea0c",
"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/30eb0000a101d40eea0c", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/30eb0000a101d40eea0c", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/30eb0000a101d40eea0c"]
},
"im_icon": {
"uri": "mosaic-legacy/2ea8000962099e965ff0",
"url_list": ["http://p9-hs.bytecdn.cn/obj/mosaic-legacy/2ea8000962099e965ff0", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/2ea8000962099e965ff0", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/2ea8000962099e965ff0"]
},
"im_icon_with_level": {
"uri": "mosaic-legacy/78a1007d7263887d923b",
"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/78a1007d7263887d923b", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a1007d7263887d923b", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a1007d7263887d923b"]
},
"level": 7,
"live_icon": {
"uri": "mosaic-legacy/30ee0007ccef28b99639",
"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/30ee0007ccef28b99639", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/30ee0007ccef28b99639", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/30ee0007ccef28b99639"]
},
"name": "树苗",
"new_im_icon_with_level": {
"uri": "mosaic-legacy/78a200737bef2df0fee9",
"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a200737bef2df0fee9", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/78a200737bef2df0fee9", "http://p3-hs.bytecdn.cn/obj/mosaic-legacy/78a200737bef2df0fee9"]
},
"new_live_icon": {
"uri": "mosaic-legacy/78a10056e336cb6eb911",
"url_list": ["http://p1-hs.bytecdn.cn/obj/mosaic-legacy/78a10056e336cb6eb911", "http://p9-hs.bytecdn.cn/obj/mosaic-legacy/78a10056e336cb6eb911", "http://p9-hs.bytecdn.cn/obj/mosaic-legacy/78a10056e336cb6eb911"]
},
"new_nav_live_icon": {
"uri": "hotsoon-resource/new_nva_level_icon_7.png",
"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/new_nva_level_icon_7.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/new_nva_level_icon_7.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/new_nva_level_icon_7.png"]
},
"next_diamond": 500,
"next_icon": {
"uri": "mosaic-legacy/12400003aae89daccd69",
"url_list": ["http://p3-hs.bytecdn.cn/obj/mosaic-legacy/12400003aae89daccd69", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/12400003aae89daccd69", "http://p1-hs.bytecdn.cn/obj/mosaic-legacy/12400003aae89daccd69"]
},
"next_name": "树叶",
"now_diamond": 265,
"pay_diamond_bak": 0,
"profile_dialog_bg": {
"uri": "hotsoon-resource/user_level_1.1_3x.png",
"url_list": ["http://p1-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.1_3x.png", "http://p3-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.1_3x.png", "http://p3-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.1_3x.png"]
},
"profile_dialog_bg_back": {
"uri": "hotsoon-resource/user_level_1.2_3x.png",
"url_list": ["http://p3-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.2_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.2_3x.png", "http://p1-hs.bytecdn.cn/obj/hotsoon-resource/user_level_1.2_3x.png"]
},
"screen_chat_type": 2,
"this_grade_max_diamond": 299,
"this_grade_min_diamond": 200,
"total_diamond_count": 275,
"upgrade_need_consume": 35
},
"pay_scores": 80,
"push_comment_status": true,
"push_digg": true,
"push_follow": true,
"push_friend_action": true,
"push_ichat": true,
"push_status": true,
"push_video_post": true,
"push_video_recommend": true,
"short_id": 651258764,
"signature": "",
"type_a1": 1,
"verified": false,
"verified_mobile": true,
"verified_reason": ""
},
"comment_delay": -1,
"create_time": 1555714446,
"description": "",
"disable_watermark": false,
"extra_scheme_url": "sslocal://webview?url=https%3A%2F%2Fhotsoon.snssdk.com%2Ffalcon%2Flive_inapp%2Fpage%2Fpush_hot%2Findex.html%23%2F%3Fitem_id%3D6681742665865907464\u0026hide_nav_bar=1\u0026hide_more=1\u0026disable_bounces=1",
"follow_display": false,
"follow_status_tag": "",
"friend_action_list": null,
"id": 6681742665865907464,
"id_str": "6681742665865907464",
"item_log_extra": "{\"item_type\":\"item\"}",
"location": "",
"media_type": 4,
"prefetch_comment": false,
"prefetch_profile": false,
"share_description": "这个视频居然有 3044 次播放,快来围观\u003e\u003e",
"share_enable": true,
"share_strong_guide": 0,
"share_title": "「唐山赵鹏」的这个视频好6,快来围观!",
"share_url": "http://reflow.huoshan.com/share/item/6681742665865907464/?tag=0\u0026timestamp=1555726295\u0026watermark=2\u0026media_type=4\u0026",
"song": {
"album": "",
"author": "汤潮",
"cover_large": {
"uri": "douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d",
"url_list": ["http://sf1-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~720x720.webp", "http://sf3-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~720x720.webp", "http://sf6-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~720x720.webp"]
},
"cover_thumb": {
"uri": "douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d",
"url_list": ["http://sf1-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~100x100.webp", "http://sf3-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~100x100.webp", "http://sf6-hscdn-tos.pstatp.com/img/douyin-web-image/56d11bca0e17c8d56c7f0414797a5b8d~100x100.webp"]
},
"duration": 228,
"id": 6581309914948307719,
"play_url": {
"uri": "9fe4000380eebcd5f4c7",
"url_list": ["http://p3-hs.bytecdn.cn/obj/9fe4000380eebcd5f4c7", "http://p1-hs.bytecdn.cn/obj/9fe4000380eebcd5f4c7", "http://p6-hs.bytecdn.cn/obj/9fe4000380eebcd5f4c7"]
},
"share_description": "玩视频上火山,快来围观!",
"share_title": "玩视频上火山,快来围观!",
"share_url": "https://reflow.huoshan.com/share/music/6581309914948307719/",
"source_platform": 25,
"status": 1,
"title": "美了美了",
"video_cnt": 60786
},
"stats": {
"comment_count": 17,
"digg_count": 77,
"play_count": 3044,
"share_count": 2
},
"status": 102,
"tips": "",
"tips_url": "https://hotsoon.snssdk.com/hotsoon/in_app/pyramid_selling/?source=money",
"title": "",
"user_bury": 0,
"user_digg": 0,
"video": {
"allow_cache": true,
"cover": {
"avg_color": "#EBCEE1",
"uri": "tplv-hs-large/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b",
"url_list": ["http://p3-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-large.webp", "http://p1-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-large.webp", "http://p6-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-large.webp"]
},
"cover_animated": null,
"cover_medium": {
"avg_color": "#7A6D53",
"uri": "tplv-hs-medium/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b",
"url_list": ["http://p3-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-medium.webp", "http://p1-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-medium.webp", "http://p6-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-medium.webp"]
},
"cover_thumb": {
"avg_color": "#3D3D3D",
"uri": "tplv-hs-live:100:100/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b",
"url_list": ["http://p3-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-live:100:100.webp", "http://p1-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-live:100:100.webp", "http://p6-hs.bytecdn.cn/img/tos-cn-p-0000/03b3b93d2cb542bc9e8b5c32feb0b03b~tplv-hs-live:100:100.webp"]
},
"download_url": ["https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=0\u0026app_id=1112\u0026vquality=normal\u0026watermark=2\u0026long_video=0\u0026sf=3\u0026ts=1555726295", "https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=1\u0026app_id=1112\u0026vquality=normal\u0026watermark=2\u0026long_video=0\u0026sf=3\u0026ts=1555726295"],
"duration": 22.385,
"gif_uri": "1f29c0004ca82ca37b85b",
"gif_url_list": ["http://p3-hs.bytecdn.cn/img/mosaic-legacy/1f29c0004ca82ca37b85b~noop.image", "http://p1-hs.bytecdn.cn/img/mosaic-legacy/1f29c0004ca82ca37b85b~noop.image", "http://p1-hs.bytecdn.cn/img/mosaic-legacy/1f29c0004ca82ca37b85b~noop.image"],
"h265_uri": "h265/v0300cde0000bit52uar6q7snu08rn50_720p",
"h265_url": ["http://v3-hs.ixigua.com/72cde5258e0e1bce5c28d71169077b6d/5cba8dfd/video/m/2208c1736b80fb04ca4a1bee6a9fe4320551161cf75e00009ce3770cff79/?rc=M3E7eHZ0dXg4bDMzaGYzM0ApQHRAbzg5Njo8MzQzMzY0NDUzNDVvQGg2dSlAZjV1KWZzcHcxeW9mNTRAMm8tazQzXi9xXy0tYS0wc3MtbyNvIzIuNC0yMS0uMi4tLTE2LTojbyM6YS1xIzpgaF4rYmZiZjojLi5e", "http://v6-hs.ixigua.com/dfa36dfcb74ffe4358dfe9eed886a122/5cba8dfd/video/m/2208c1736b80fb04ca4a1bee6a9fe4320551161cf75e00009ce3770cff79/?rc=M3E7eHZ0dXg4bDMzaGYzM0ApQHRAbzg5Njo8MzQzMzY0NDUzNDVvQGg2dSlAZjV1KWZzcHcxeW9mNTRAMm8tazQzXi9xXy0tYS0wc3MtbyNvIzIuNC0yMS0uMi4tLTE2LTojbyM6YS1xIzpgaF4rYmZiZjojLi5e", "https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=0\u0026app_id=1112\u0026vquality=normal\u0026quality=720p\u0026codec=h265\u0026sf=3\u0026origin=0\u0026ts=1555726295"],
"height": 1024,
"preload_size": 375000,
"uri": "v0300cde0000bit52uar6q7snu08rn50",
"url_list": ["https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=0\u0026app_id=1112\u0026vquality=normal\u0026watermark=0\u0026long_video=0\u0026sf=3\u0026ts=1555726295", "https://api.huoshan.com/hotsoon/item/video/_playback/?video_id=v0300cde0000bit52uar6q7snu08rn50\u0026line=1\u0026app_id=1112\u0026vquality=normal\u0026watermark=0\u0026long_video=0\u0026sf=3\u0026ts=1555726295"],
"video_id": "v0300cde0000bit52uar6q7snu08rn50",
"watermark": true,
"width": 576
},
"weibo_share_title": "#玩视频上火山#唐山赵鹏在火山上分享了视频,快来围观!传送门戳我\u003e\u003ehttp://reflow.huoshan.com/share/item/6681742665865907464/?tag=0\u0026timestamp=1555726295\u0026watermark=2\u0026media_type=4\u0026"
},
"rid": "2019042010113501001405907731655",
"tags": [],
"type": 3
}
明显: item[‘data’][‘video’][‘url_list’] 内存放播放地址
3.获取用户id
当每一个用户的主页链接被分享时,会产生一个短链接如下
http://reflow.huoshan.com/hotsoon/s/vAzc0get700/
然而当用户实际打开的时候,会经过301(缺少结尾/时)、302重定向到真实的页面
302重定向,该请求响应的Location为
http://reflow.huoshan.com/share/user/23790988726/?timestamp=1555745525&share_ht_uid=-1&did=53955772475&utm_medium=huoshan_android&tt_from=copy_link&iid=69961357053&app=live_stream&utm_source=copy_link&schema_url=sslocal%3A%2F%2Fprofile%3Fid%3D23790988726
包含我们要的user_id=23790988726
通过如下代码根据短链接获取到user_id
response = requests.get(short_url, headers=http_headers, timeout=10)
user_id = str(response.url).split("/")[5]
4.完整代码
综上,Python实现多线程批量采集火山小视频的代码如下
from six.moves import queue as Queue
import requests
import time
import json
from threading import Thread
import os
import sys
import codecs
http_headers = { 'Accept': '*/*','Connection': 'keep-alive', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36'}
THREADS = 10
def make_sure_path(path):
try:
if not os.path.exists(path):
os.mkdir(path)
except:
pass
def download(item, user_id):
local_path = os.getcwd()
mp4_path = os.path.join(local_path, "download")
make_sure_path(mp4_path)
user_path = os.path.join(mp4_path, str(user_id))
make_sure_path(user_path)
dl_url = item['data']['video']['url_list'][0]
dl_vid = item['data']['video']['video_id']
video_path = os.path.join(user_path, str(dl_vid) + '.mp4')
if os.path.exists(video_path):
return
print("Downloading %s from %s.\n" % (dl_vid, dl_url))
try:
r = requests.get(dl_url)
with open(video_path, "wb") as code:
code.write(r.content)
except:
pass
class DownloadWorker(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
item, user_id = self.queue.get()
download(item, user_id)
self.queue.task_done()
class CrawlerScheduler(object):
def __init__(self, user_url):
self.short_url = user_url
self.queue = Queue.Queue()
self.scheduling()
def scheduling(self):
for x in range(THREADS):
worker = DownloadWorker(self.queue)
worker.daemon = True
worker.start()
self.download_user_videos()
def download_user_videos(self):
response = requests.get(self.short_url, headers=http_headers, timeout=10)
user_id = str(response.url).split("/")[5]
video_count = self.get_user_videos(user_id)
self.queue.join()
print("\n火山用户- %s, 视频数量- %s\n\n" % (user_id, str(video_count)))
print("\n下载完成- %s\n\n" % user_id)
def get_user_videos(self, user_id):
url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?min_time=0&offset=0&count=20&req_from=enter_auto&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726295213&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
int(time.time()))
response = requests.get(url, headers=http_headers, timeout=10)
data = json.loads(response.text)['data']
extra = json.loads(response.text)['extra']
self._join_download_queue(data, user_id)
video_list = data
while (extra['has_more']):
try:
url = "https://api-a.huoshan.com/hotsoon/user/" + user_id + "/items/?max_time=" + str(extra[
'max_time']) + "&offset=20&count=20&req_from=feed_loadmore&ad_user_agent=com.ss.android.ugc.live%2F615+%28Linux%3B+U%3B+Android+6.0.1%3B+zh_CN%3B+Redmi+4A%3B+Build%2FMMB29M%3B+Chrome%29&live_sdk_version=615&iid=69925132268&device_id=51947871999&ac=wifi&channel=xiaomi&aid=1112&app_name=live_stream&version_code=615&version_name=6.1.5&device_platform=android&ssmix=a&device_type=Redmi+4A&device_brand=Xiaomi&language=zh&os_api=23&os_version=6.0.1&uuid=866982031402425&openudid=6909280b584153cf&manifest_version_code=615&resolution=720*1280&dpi=320&update_version_code=6152&_rticket=1555726323246&ab_version=822322%2C501250%2C839334%2C800224%2C769881%2C818778%2C832457%2C689929%2C814443%2C841662%2C802089%2C788368%2C692223%2C830471%2C803343%2C845250%2C712301%2C770525%2C788456%2C557631%2C819048%2C846680%2C840757%2C661940%2C374107%2C705072%2C845593%2C840609%2C848718%2C682009%2C691946%2C837343%2C837342%2C762673%2C844518%2C837345%2C508756%2C795733%2C848656%2C841501%2C840425%2C848650%2C821181%2C835596%2C734849%2C819013%2C837709%2C848219%2C457534%2C832768%2C797937%2C665355%2C797621%2C840441&ts=" + str(
int(time.time()))
response = requests.get(url, headers=http_headers, timeout=10)
data = json.loads(response.text)['data']
extra = json.loads(response.text)['extra']
self._join_download_queue(data, user_id)
video_list = video_list + data
except:
pass
return len(video_list)
def _join_download_queue(self, list, user_id):
for item in list:
self.queue.put((item, user_id))
def parse_txt(fileName):
with open(fileName, "rb") as f:
txt = f.read().rstrip().lstrip()
txt = codecs.decode(txt, 'utf-8')
txt = txt.replace("\t", ",").replace(
"\r", ",").replace("\n", ",").replace(" ", ",")
txt = txt.split(",")
numbers = list()
for raw_site in txt:
site = raw_site.lstrip().rstrip()
if site:
numbers.append(site)
return numbers
if __name__ == "__main__":
if os.path.exists("url.txt"):
content = parse_txt("url.txt")
else:
print("找不到 url.txt")
sys.exit(1)
for user_url in content:
CrawlerScheduler(user_url)