python中requests库的初级使用

基于2.19.1版本的requests库，参考官方文档 http://docs.python-requests.org/en/master/

首先是安装

pip install requests

导入使用

import requests

1. 发起请求

request.请求方式(请求参数)

例如：

response = requests.get('https://api.github.com/events', params={'key': 'value'})

response = requests.post('http://httpbin.org/post', data={'key':'value'})

请求携带自己的cookie

>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

requests参数介绍：

method: 请求方式

url: 请求的URL地址

params: url查询字符串，类型要求字典或bytes

data: 请求体，类型要求字典、元组列表、bytes和文件对象。

json：请求体，类型要求python数据类型被json序列化后

headers: 请求头，类型要求 HTTP请求头字典

cookies: 类型要求字典或CookieJar对象

files: 类型要求字典

auth:身份认证，类型要求元组auth=("user", "pass")

timeout: 请求超时， 单位秒， float or tuple

allow_redirects: 重定向， 布尔型

proxies: 代理， 字典类型

verify: 是否适用CA认证，布尔型

stream：布尔型, 指定为False, 立即下载响应内容

cert：认证证书， 类型为字符串和元组，字符串证书.pem文件的路径， 元组（'cert', 'key'）

2.获取响应

response.方法名

例如：

response.text  # 自动解码响应内容
    response.encoding  # 查看当前使用的解码方式，可以通过response.encoding='utf-8'设置
response.content # 二进制响应内容
response.json()  # 响应数据为json的内容
response.raw  # 原始响应对象, 请求参数需设置stream=True
response.raw.read()

上述例子仍有局限性，response.raw.read()一般不用，读取响应的流式数据到文件一般采用以下方法

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

响应的状态码

response.status_code  # 获取当前响应的状态码
# 请求附带内置状态码，可作为对比
>>> r.status_code == requests.codes.ok
True

响应头

response.headers
# 任意大写获取响应头
>>> r.headers['Content-Type']
'application/json'
>>> r.headers.get('content-type')
'application/json'

response.cookies  # 响应的cookie
response.request._cookies  # 请求的cookie，
# 利用 requests.utils.dict_from_cookiejar 转成字典
cook_dict = requests.utils.dict_from_cookiejar(request_cookie)

3.自定义headers

在请求参数headers传入

例如：

response = requests.get('https://api.github.com/events', params={'key': 'value'}, headers={'user-agent': 'my-app/0.0.1'})

但，我们自己设置的headers优先级比较低：

如果在主机外重定向，将删除headers；

代理含有headers将覆盖等...

4. 复杂的POST请求

# 1 传递字典，会自动转化为表单提交
payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post("http://httpbin.org/post", data=payload)
print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

# 2 还可以传递元组列表，一个键对应多值及其适用
payload = (('key1', 'value1'), ('key1', 'value2'))
r = requests.post('http://httpbin.org/post', data=payload)
print(r.text)
{
  ...
  "form": {
    "key1": [
      "value1",
      "value2"
    ]
  },
  ...
}

直接传递json参数的post请求

# 转为json传递
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
# 直接传递，系统自己转化为json数据
r = requests.post(url, json=payload)

上传文件的post请求

files = {'file': open('report.xls', 'rb')}  # 以二进制模式打开文件，避免不必要的错误
r = requests.post(url, files=files)

5.重定向和历史和超时

可以通过response.history查看请求的所有重定向，返回的是列表，顺序为响应的顺序

# http请求被重定向https
>>> r = requests.get('http://github.com')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[<Response [301]>]

重定向的设置，只要在请求中设置参数allow_redirects=True or False

设置请求超时，只要在请求中设置参数timeout=seconds, 程序将在secodes秒后没有响应而引发异常，如果没设置，则不限时

6 异常和抛出

抛出响应错误

response.raise_for_status()  # 如果响应200，结果为None

response.raise_for_status() 响应状态码不是200是会抛出异常

网络问题：DNS、拒绝连接等会抛出ConnectionError，

如果请求超时，Timeout则会引发异常。

如果请求超过配置的最大重定向数， TooManyRedirects则会引发异常。

请求显式引发的所有异常都继承自 requests.exceptions.RequestException。

python中requests库的初级使用

requests参数介绍：

猜你喜欢