版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_37049781/article/details/81872164
虽然 urllib,与 urllib2已经能够满足一般的爬虫需求,但是对于人类来说仍然不是太友好。requests 模块继承了urllib2的所有特性,并支持HTTP连接保持和连接池,支持使用cookie保持会话,文件上传,自动确定响应内容编码等。
requests 中文文档: http://docs.python-requests.org/zh_CN/latest/index.html
requests基本请求
import requests
response = requests.get("www.baidu.com")
response = requests.post("www.baidu.com",data=data)
设置headers,传递参数
import requests
# 定义参数
kw = {"wd":"python"}
headers = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"}
# 在requests的请求中会自动将请求字典参数进行url编码,在post请求传递参数中,使用data
response = requests.get("www.baidu.com",params = kw, headers = headers)
为请求设置代理
import requests
# 根据协议类型,选择不同的代理
proxies = {
"http": "http://12.34.56.79:9527",
"https": "http://12.34.56.79:9527",
}
response = requests.get("http://www.baidu.com", proxies = proxies)
print response.text
- 私密代理验证
import requests
# 如果代理需要使用HTTP Basic Auth,可以使用下面这种格式:
proxy = { "http": "account:password@host:port" }
response = requests.get("http://www.baidu.com", proxies = proxy)
print response.text
- web验证
import requests
auth=('account', 'passwd')
response = requests.get('host', auth = auth)
print response.text
设置cookies与session
- 添加cookies
cookies = {'cookies_are':'working'}
r = requests.get(url, cookies=cookies)
# 或者是
jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
url = 'http://httpbin.org/cookies'
r = requests.get(url, cookies=jar)
- 获取cookie
import requests
response = requests.get("http://www.baidu.com/")
name = response.cooks["cookie_name"]
print name
# 将cookies转化为字典
cookiejar = response.cookies
cookiedict = requests.utils.dict_from_cookiejar(cookiejar)
- 添加sission
import requests
# 1. 创建session对象,可以保存Cookie值
ssion = requests.session()
# 2. 处理 headers
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}
data = {"email":"account", "password":"password"}
ssion.post("http://www.renren.com/PLogin.do", data = data)
跳过SSL证书验证
r = requests.get("https://www.12306.cn/mormhweb/", verify = False)