分享一下我老师大神的人工智能教程!零基础,通俗易懂!http://blog.csdn.net/jiangjunshow
也欢迎大家转载本篇文章。分享知识,造福人民,实现我们中华民族伟大复兴!
首先是官方文档
Compliance
Requests is intended to be compliant with all relevant specifications and RFCs where that compliance will not cause difficulties for users. This attention to the specification can lead to some behaviour that may seem unusual to those not familiar with the relevant specification.
Encodings
When you receive a response, Requests makes a guess at the encoding to use for decoding the response when you access the Response.text
attribute. Requests will first check for an encoding in the HTTP header, and if none is present, will use chardet to attempt to guess the encoding.
The only time Requests will not do this is if no explicit charset is present in the HTTP headersand the Content-Type
header contains text
. In this situation, RFC 2616 specifies that the default charset must be ISO-8859-1
. Requests follows the specification in this case. If you require a different encoding, you can manually set the Response.encoding
property, or use the rawResponse.content
.
官方文档的意思就是,如果requests没有发现http headers中的charset,就会使用默认的IOS-8859-1(也就是我们常说的latin-1,但是我们一般的网页使用的charset其实是utf-8)这会导致什么结果呢?
<span style="font-size:18px;">url = 'http://weather.sina.com.cn/xiamen'content = requests.get(url)print content.encoding #ISO-8859-1#这就说明了编码方式的确是latin-1</span>
当然,如果我们知道网页的编码方式是utf-8,我们可以在调用response.text()之前使用response.encoding='utf-8',这样就不需要像上文一样先使用encoding('latin-1')还原之后再decoding了
response.content
里,是bytes形式
自己想怎么处理就怎么处理。