关注“python趣味爱好者”公众号,回复“淘宝口罩”获取源代码
淘宝网淘宝贝,淘宝网里淘口罩
在新型冠状肺炎病毒的影响下,我们每个人都应该佩戴口罩,口罩成了我们日常生活的必需品,今天我们带着爬虫来到某宝的某店铺,看看口罩的库存如何。
首先,我们打开某宝网搜索口罩,随便点一家店铺:
按F12并进入network:
我们在返回的数据里找一找,看看库存放在哪个文件里:
这个initltem开头的文件返回了一个json串:
我们解析一下,发现格式不太对,多了setMdskip和前后两个括号,用strip()函数去掉就可以了:
response = response.strip()
response = response.strip("setMdskip")
response = response.strip()
response = response.strip("()")
我们仔细看一下这个json串:
库存编号里,显示不隐藏,总数量6166,与页面显示的一致,没错,这就是我们想要的数据,现在我们来获取一下:
右键并复制到这里:
https://curl.trillworks.com/
返回的即为python代码:
import requests
headers = {
'authority': 'mdskip.taobao.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36',
'sec-fetch-dest': 'script',
'accept': '*/*',
'sec-fetch-site': 'cross-site',
'sec-fetch-mode': 'no-cors',
'referer': 'https://detail.tmall.com/item.htm?id=612119789632&ali_refid=a3_430583_1006:1110588080:N:iK+n44Dtnek1xlEHkCuwGQ==:0e77f5926c5ddb76f2bc2ce63580999c&ali_trackid=1_0e77f5926c5ddb76f2bc2ce63580999c&spm=a230r.1.14.1',
'accept-language': 'zh-CN,zh;q=0.9',
'cookie': 'cna=iEuzFcgzMT4CAdJSNcQUf989; thw=cn; hng=CN%7Czh-CN%7CCNY%7C156; tracknick=%5Cu4E0A%5Cu5584%5Cu82E5%5Cu6C341213899; _cc_=WqG3DMC9EA%3D%3D; tg=0; UM_distinctid=16d5937999a6f-027420b0cd4148-5373e62-144000-16d5937999b288; enc=WJvahiK5KEgRjRkHkr4PsVdcvqocsJndj7xM2uXpQI8WMCp19XpDm4tg2kOfsRsB6cxOXkRwnixmXck2lIqicg%3D%3D; x=e%3D1%26p%3D*%26s%3D0%26c%3D0%26f%3D0%26g%3D0%26t%3D0; ucn=center; t=63d9f6a63545a51298fdf974bc9928a3; cookie2=19514bb0fab9afa6af5793e3509d4293; v=0; _tb_token_=e0d5e3131fee3; _samesite_flag_=true; l=dBagdkj7qSlBFLUsBOCwCARSHL79jIRfgu8NVJK9i_5dK6Y_Eq7OoWkAWFv6cjWcTBTB4dt0anvTgehL8yAm0OpTpe1VivHDBef..; isg=BImJ5oQ0M3YJT8zi9RbaAX1FmLXj1n0IQ_CUTCv-AXCucqiEcyYw2Z2qsNZEKhVA',
}
params = (
('isUseInventoryCenter', 'false'),
('cartEnable', 'true'),
('service3C', 'false'),
('isApparel', 'false'),
('isSecKill', 'false'),
('tmallBuySupport', 'true'),
('isAreaSell', 'false'),
('tryBeforeBuy', 'false'),
('offlineShop', 'false'),
('itemId', '612119789632'),
('showShopProm', 'false'),
('isPurchaseMallPage', 'false'),
('itemGmtModified', '1582189215000'),
('isRegionLevel', 'false'),
('household', 'false'),
('sellerPreview', 'false'),
('queryMemberRight', 'true'),
('addressLevel', '2'),
('isForbidBuyItem', 'false'),
('callback', 'setMdskip'),
('timestamp', '1582194247152'),
('isg', 'dBOibW8nqSlo-1fJBOfN5ARSHL7twIRb4sPy7r2KlICPOe5eR06PWZVEHCYwCnGVH6mBJ3J6m7-0BeYBqHpDBkyca6Fy_kkqndC..'),
('isg2', 'BLi40wKSElVwM32xzZ_qw4hkiWZKIRyr2r_F__IpsfO4DVj3mzA0OugkxQW9XdSD'),
('ref', 'https://s.taobao.com/search?q=%E5%8F%A3%E7%BD%A9&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&spm=a21bo.2017.201856-taobao-item.1&ie=utf8&initiative_id=tbindexz_20170306'),
)
response = requests.get('https://mdskip.taobao.com/core/initItemDetail.htm', headers=headers, params=params)
#NB. Original query string below. It seems impossible to parse and
#reproduce query strings 100% accurately so the one below is given
#in case the reproduced version is not "correct".
# response = requests.get('https://mdskip.taobao.com/core/initItemDetail.htm?isUseInventoryCenter=false&cartEnable=true&service3C=false&isApparel=false&isSecKill=false&tmallBuySupport=true&isAreaSell=false&tryBeforeBuy=false&offlineShop=false&itemId=612119789632&showShopProm=false&isPurchaseMallPage=false&itemGmtModified=1582189215000&isRegionLevel=false&household=false&sellerPreview=false&queryMemberRight=true&addressLevel=2&isForbidBuyItem=false&callback=setMdskip×tamp=1582194247152&isg=dBOibW8nqSlo-1fJBOfN5ARSHL7twIRb4sPy7r2KlICPOe5eR06PWZVEHCYwCnGVH6mBJ3J6m7-0BeYBqHpDBkyca6Fy_kkqndC..&isg2=BLi40wKSElVwM32xzZ_qw4hkiWZKIRyr2r_F__IpsfO4DVj3mzA0OugkxQW9XdSD&ref=https%3A%2F%2Fs.taobao.com%2Fsearch%3Fq%3D%25E5%258F%25A3%25E7%25BD%25A9%26imgfile%3D%26commend%3Dall%26ssid%3Ds5-e%26search_type%3Ditem%26sourceId%3Dtb.index%26spm%3Da21bo.2017.201856-taobao-item.1%26ie%3Dutf8%26initiative_id%3Dtbindexz_20170306', headers=headers)
接着,我们把信息提取一下:
json_datas = json.loads(response)
jsondata = json_datas['defaultModel']['inventoryDO']['icTotalQuantity']
这里有个小技巧,按Ctrl+F可以快速定位到想要的内容,接着,完善一下程序,我们获取系统时间:
localtime = time.asctime( time.localtime(time.time()) )
便于查看是什么时候获取的口罩数量:
print(localtime, "剩余口罩数量:",jsondata)
最后把以上代码写成一个函数,然后在main函数里调用:
def main():
while True:
get_num()
time.sleep(10)
if __name__ == '__main__':
main()
为了防止服务器封IP,这里我们设置一个缓冲时间
程序运行了半个小时左右:
半个小时内,口罩数量从6619变成了5986,这个销量还是挺高的
最后,想要获取完整代码的同学,欢迎大家关注“python趣味爱好者”公众号,回复“淘宝口罩”获取源代码