腾讯招聘：https://careers.tencent.com/

1.找接口

我们去腾讯招聘网站去找有关python的招聘信息，在搜索框输入python,接口变成：

https://careers.tencent.com/search.html?keyword=python

我们用这个接口直接去请求网页资源的话，会发现没有数据，只抓到了网页的框架

代码如下：

import requests
from lxml import etree

url = 'https://careers.tencent.com/search.html?keyword=python'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
}
response = requests.get(url=url, headers=headers)
content = response.content.decode('utf-8')
with open('job.html', 'w', encoding='utf-8') as fp:
    fp.write(content)

我们上述代码中把请求到网页的内容保存到了job.html

程序运行完之后，点开job.html 在浏览器中打开：效果如下

这种情况很有可能是ajax请求，我们需要重新去找接口

打开F12 network-->XHR 找到如下：

找一下请求头的链接：

https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1557450635595&countryId=&cityId=&bgIds=&productId=&categoryId=&parentCategoryId=&attrId=&keyword=python&pageIndex=1&pageSize=10&language=zh-cn&area=cn

这个接口里面有很多参数，我们可以删去没有用的，pageIndex 是用来传页码的，我们可以直接传页码

https://careers.tencent.com/tencentcareer/api/post/Query?keyword=python&pageIndex={}&pageSize=10

重新开始请求，ajax请求响应回来的是json数据格式

2.生产者与消费者模式分析

我们在整个过程需要请求接口，然后再解析数据

生产者用来请求接口，消费者用来解析数据

3.生产者

从page_queue里面取出来page，拼接好url

4.消费者

使用生产者与消费者模式爬取腾讯招聘网的招聘信息

1.找接口

猜你喜欢