Requests 是用 Python 语言编写的,基于 urllib,采用 Apache2 Licensed 开元协议的 HTTP 库,它比 urllib 更加方便,编写爬虫和测试服务器响应数据时经常会用到。
安装 Requests
通过 pip 安装
>>> pip install requests
下载源码安装
git clone git://github.com/kennethreitz/requests.git
cd requests
python setup.py install
Requests 对象
Requests 对象函数 | 作用 |
---|---|
requests.get() | GET 请求 |
requests.post() | POST 请求 |
requests.put() | PUT 请求 |
requests.delete() | DELETE 请求 |
requests.head() | HEAD 请求 |
requests.options() | OPTIONS 请求 |
Response 对象
Response 对象变量或函数 | 函数意义 |
---|---|
response.url | requests 请求的 URL |
response.status_code | 响应状态码 |
response.encoding | 响应的编码格式 |
response.text | 获取响应的文本 |
response.raw | 返回原始相应体,使用 response.raw.read()读取 |
response.content | 字节方式的响应体 |
response.headers | 以 dict 对象存储响应头,这个 dict 比较特殊,key 不区分大小写,若 key 不存在,则返回 None |
response.json() | 将响应内容直接转换成 JSON 格式 |
response.raise_for_status() | 请求失败抛出异常(status_code 非 200) |
response.reason | 对象响应码的解释,例如 200 时,response.reason = “OK” |
GET 请求
import requests
import json
params_dict = {'question':'Python Requests'} # 请求参数
response = requests.get('http://gank.io/api/data/Android/1/1', params=params_dict)
status_code = response.status_code # 状态码
url = response.url # 请求 URL
encoding = response.encoding # 检测编码
headers_dict = response.headers # 响应头 dict
print("url = ", url)
print("status_code = ", status_code)
print("encoding = ", encoding)
print("headers:")
for key,value in headers_dict.items():
print(key," = ",value)
输出结果:
url = http://gank.io/api/data/Android/1/1?question=Python+Requests
status_code = 200
encoding = None
headers:
Server = Tengine
Content-Type = application/json
Content-Length = 426
Connection = keep-alive
Date = Sat, 30 Dec 2017 08:29:16 GMT
Via = cache13.l2nu20-2[191,200-0,M], cache37.l2nu20-2[192,0], cache2.cn370[258,200-0,M], cache8.cn370[259,0]
X-Cache = MISS TCP_MISS dirn:-2:-2 mlen:-1
X-Swift-SaveTime = Sat, 30 Dec 2017 08:29:16 GMT
X-Swift-CacheTime = 0
Timing-Allow-Origin = *
EagleId = 3b6c8ad015146225559598476e
文本响应内容
response = requests.get("http://gank.io/api/data/Android/1/1")
text = response.text # 文本响应内容
print("text = ",text)
输出结果:
text = {
"error": false,
"results": [
{
"_id": "5a3a4654421aa90fe72536cc",
"createdAt": "2017-12-20T19:15:32.928Z",
"desc": "Git \u4f7f\u7528\u4e4b\u91cd\u5199\u5386\u53f2\u8bb0\u5f55",
"publishedAt": "2017-12-27T12:13:22.418Z",
"source": "web",
"type": "Android",
"url": "http://www.jianshu.com/p/8f46e13a8ada",
"used": true,
"who": "ZhangTitanjum"
}
]
}
二进制响应内容
response = requests.get("http://gank.io/api/data/Android/1/1")
content = response.content # 二进制响应内容
print("content = ",content)
输出结果:
content = b'{\n "error": false, \n "results": [\n {\n "_id": "5a3a4654421aa90fe72536cc", \n "createdAt": "2017-12-20T19:15:32.928Z", \n "desc": "Git \\u4f7f\\u7528\\u4e4b\\u91cd\\u5199\\u5386\\u53f2\\u8bb0\\u5f55", \n "publishedAt": "2017-12-27T12:13:22.418Z", \n "source": "web", \n "type": "Android", \n "url": "http://www.jianshu.com/p/8f46e13a8ada", \n "used": true, \n "who": "ZhangTitanjum"\n }\n ]\n}\n'
JSON 响应内容
response = requests.get("http://gank.io/api/data/Android/1/1")
content = response.content # 二进制响应内容
print("content = ",content)
输出结果:
json = {'error': False, 'results': [{'_id': '5a3a4654421aa90fe72536cc', 'createdAt': '2017-12-20T19:15:32.928Z', 'desc': 'Git 使用之重写历史记录', 'publishedAt': '2017-12-27T12:13:22.418Z', 'source': 'web', 'type': 'Android', 'url': 'http://www.jianshu.com/p/8f46e13a8ada', 'used': True, 'who': 'ZhangTitanjum'}]}
原始响应内容
response = requests.get("http://gank.io/api/data/Android/1/1",stream=True)
raw = response.raw # 原始响应内容
print("raw = ",raw)
print("raw read byte = ",raw.read(10))
输出结果:
raw = <urllib3.response.HTTPResponse object at 0x024EF4B0>
raw read byte = b'{\n "error'
想获取服务端的原始相应内容,需要在请求中设置 stream = True。
添加 HTTP headers
github_url = 'https://developer.github.com/v3/some/endpoint'
headers = {"user-agent":"my-app-v0.1"}
resp = requests.get(github_url,headers=headers)
print(resp.text)
POST 表单请求
POST 请求发送表单请求,传递一个 dict 给 data 参数即可。
user_url = 'http://httpbin.org/post'
user_info = {'name':'mike','age':21}
resp = requests.post(user_url,data=user_info)
text = resp.text
print(text)
输出结果:
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "21",
"name": "mike"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Content-Length": "16",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.18.4"
},
"json": null,
"origin": "43.248.244.132",
"url": "http://httpbin.org/post"
}
POST JSON 字符串
print("---post json----")
user_url = 'http://httpbin.org/post'
user_info = {'name':'mike','age':21}
json_str = json.dumps(user_info)
# resp = requests.post(user_url,data=json_str) # 这种方式也是可以的
resp = requests.post(user_url,json=user_info)
text = resp.text
print(text)
输出结果:
---post json----
{
"args": {},
"data": "{\"name\": \"mike\", \"age\": 21}",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Content-Length": "27",
"Content-Type": "application/json",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.18.4"
},
"json": {
"age": 21,
"name": "mike"
},
"origin": "58.246.141.153",
"url": "http://httpbin.org/post"
}
POST File 上传文件
在 requests 中发送文件的接口只有一种,那就是 requests.post file 参数,请求形式如下:
url ='http://httpbin.org/post'
data = None
files = {}
resp = requests.post(url,data,files=files)
其中 files 参数是可以接收多种形式的数据,最基本的 2 种形式为:
- 字典类型 dict ( 官方推荐使用字典参数格式 )
- 元祖列表类型 tuple
(1) 字典类型 dict 的 files 参数
print('-----post 上传文件-------')
url ='http://httpbin.org/post'
data = None
files_dict = {'field':('filename',open('D:/aa.jpg','rb'),'image/jpeg',{'refer':'www.baidu.com'})}
resp = requests.post(url,data,files=files_dict)
print('text = ',resp.text)
其中,这个 files_dict 的 key 就是发送 post 请求时的字段名(即 field 字段),而字典的 value 则描述了要发送的文件的信息。
文件信息包括 (“filename”,”fileobject”,”Content-Type”,”headers”)
输出结果:
-----post 上传文件-------
text = {
"args": {},
"data": "",
"files": {
"field": "data:image/jpeg;base64,/9j/4AAQSf8A8kLr/wCN0f8ADQnwx/6Gf/yQuv8A43RRQB4B+1f8QvDHjv8A4Rb/AIRTU/t/2L7V5/8Ao8sWzf5O376rnOxumelFFFAH/9k="
},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"Content-Length": "163038",
"Content-Type": "multipart/form-data; boundary=8758341a8d8844869074054764661556",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.18.4"
},
"json": null,
"origin": "223.167.118.81",
"url": "http://httpbin.org/post"
}
(2) 元祖类型 tuple 的 files 参数
print('-----post tuple 上传文件-------')
url ='http://httpbin.org/post'
data = None
files_tuple = ({'field':('filename',open('D:/aa.jpg','rb'),'image/jpeg',{'refer':'www.baidu.com'})})
resp = requests.post(url,data,files=files_tuple)
print('text = ',resp.text)
上传文件同时传递 data 参数
data = {"k1" : "v1"}
files = {
"field1" : open("1.png", "rb")
}
r = requests.post("http://httpbin.org/post", data, files=files)
Cookie
cookies = response.cookies # cookies
print(type(cookies))
输出结果:
<class 'requests.cookies.RequestsCookieJar'>
response.cookies 返回的对象是 RequestsCookieJar ,它的行为和字典 dict 类似。
res = requests.get('http://www.baidu.com')
cookies = res.cookies
print(type(cookies))
print('keys = ',cookies.keys())
print('values = ',cookies.values())
print('cookies["BDORZ"] = ',cookies['BDORZ'])
输出结果:
<class 'requests.cookies.RequestsCookieJar'>
keys = ['BDORZ']
values = ['27315']
cookies["BDORZ"] = 27315
使用 cookies,发送 cookies 至服务器:
print('------cookies-----')
cookie_url = 'http://httpbin.org/cookies'
cookies_dict = dict(cookies_key = 'Python')
r = requests.get(cookie_url,cookies = cookies_dict)
text = r.text
print(text)
输出结果:
------cookies-----
{
"cookies": {
"cookies_key": "Python"
}
}
超时
timeout 参数可以设定在一定时间之后停止等待响应,时间单位: 秒。
print('------timeout------')
response_git = requests.get('http://github.com',timeout=0.1)
输出结果:
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x024E4330>, 'Connection to github.com timed out. (connect timeout=0.1)'))
注意: timeout 仅对连接过程有效,与响应体的下载无关,timeout 并不是整个下载响应的时间限制,而是如果服务器在 timeout 秒内没有应答,将会引发一个异常(更准确的说,是在 timeout 秒内没有从基础套接字上接收到任何字节的数据时)
Requests 异常
print('------timeout------')
def timeout_request():
try:
response_git = requests.get('http://github.com',timeout=0.1)
response_git.raise_for_status()
except exceptions.Timeout as e:
print('timeout')
except exceptions.HTTPError as e:
print('httperror')
else:
print("status_code = ",response_git.status_code)
print("text = ",response_git.text)
timeout_request()
常见 Requests 异常:
- ConnectionError: 由于网络原因,无法建立连接。
- HTTPError: 响应状态码不为 200,Response.raise_for_status() 会抛出 HTTPError 异常。
- Timeout : 连接超时。
- TooManyRedirects: 若请求超过了设定的最大重定向次数,则会抛出 TooManyRedirects 异常。