获取响应内容
response对象有属性:
text 请求返回的所有内容
status_code 状态码
encoding 编码
content 字节方式的响应内容,比如 以\n表示回车符, 还有\t \r等
r.json() 如果返回的是json串,则会使用Requests自带的json解码器进行json的解析
传递请求参数
import requests
dict = {'key1' : 'value1', 'key2' : 'value2'}
link = 'http://httpbin.org/get'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.get(link, headers=headers, params=dict)
print(r.content)
print(r.status_code)
print(r.json())
程序运行结果:
b'{\n "args": {\n "key1": "value1", \n "key2": "value2"\n }, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Type": "text/html", \n "Host": "httpbin.org", \n "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"\n }, \n "origin": "223.72.90.250, 223.72.90.250", \n "url": "https://httpbin.org/get?key1=value1&key2=value2"\n}\n'
200
{'args': {'key1': 'value1', 'key2': 'value2'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'text/html', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}, 'origin': '223.72.90.250, 223.72.90.250', 'url': 'https://httpbin.org/get?key1=value1&key2=value2'}
可见通过params=dict已经正确传递了请求参数key1=value1&key2=value2 。另外,如果想将紧凑格式的json数据进行格式化,可以使用在线格式工具 http://www.bejson.com/
定制请求头
上面的例子中通过指定 headers参数传递了User-Agent,我们也可以传递更多的headers信息,如
import requests
link = 'http://httpbin.org/get'
headers = {'Host' : 'www.santostang.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.get(link, headers=headers)
print(r.status_code)
还可以传递更多的headers参数,从浏览器的请求中查看Request Headers 中的内容都可以加入。
发送post请求
import requests
dict = {'key1' : 'value1', 'key2' : 'value2'}
headers = {'Host' : 'www.santostang.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.post('http://httpbin.org/post', headers=headers, data=dict)
print(r.text)
运行结果:
{
"args": {},
"data": "key1=value1&key2=value2",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "text/html",
"Host": "www.santostang.com",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
},
"json": null,
"origin": "223.72.90.250, 223.72.90.250",
"url": "https://www.santostang.com/post"
}
post请求通过data参数指定请求的参数值
设置超时
import requests
r = requests.post('http://httpbin.org/post', timeout=0.001)
print(r.text)
运行结果:
因为超时参数 timeout 设置的值0.001太小,执行后程序报错 socket.timeout: timed out
爬取豆瓣网的top250 电影
import requests
from bs4 import BeautifulSoup
def getMovies():
headers = {'Host' : 'movie.douban.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
movies = []
for i in range(0, 10):
r = requests.post('https://movie.douban.com/top250?start=' + str(i * 25), headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
div_list = soup.find_all('div', class_='hd')
for div in div_list:
title = div.a.span.text
movies.append(title)
return movies
movies = getMovies()
for i, movie in enumerate(movies):
print(str(i+1) + "==" + movie)
运行程序会显示豆瓣网的前250部电影。
BeautifulSoup的文档 可参考 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
本文内容到此结束,更多内容可关注公众号和个人微信号: