官方文档展示了urllib.error的三种异常：

1.URLError

1.1 产生原因

本地不能联网
服务期不存在
连接不到服务器
注意：HTTPError也是其子类

1.2 异常演示

1.2.1 超时错误：

from urllib import request, error

url = "http://www.google.com"
string = request.urlopen(url, timeout = 1.5).read().decode('utf8')
print(string)

urllib.error.URLError: <urlopen error timed out>

1.2.2 连接不上

去除timeout参数：

from urllib import request, error

url = "http://www.google.com"
string = request.urlopen(url).read().decode('utf8')
print(string)

urllib.error.URLError: <urlopen error [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应，连接尝试 失败。>

1.2.3 网站不存在

from urllib import request, error

url = "http://www.blogabcdefgs.net/"
string = request.urlopen(url).read().decode('utf8')
print(string)

urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

2.HTTPError

2.1 代码解释

查看错误代码

2.2 异常演示

2.2.1 方法错误

以post方法爬取一个不允许post的网站

from urllib import request, parse
url = 'http://httpbin.org'
dict = {
    'name': 'abc'
}
data = bytes(parse.urlencode(dict), encoding='utf8')
req = request.Request(url=url, data=data, method='POST')
string = request.urlopen(req).read().decode('utf8')
print(string)

抛出：

urllib.error.HTTPError: HTTP Error 405: METHOD NOT ALLOWED

3.ContentTooShortError

官方解释：

ContentTooShortError(msg, content)
当urlretrieve函数检测到下载的数据量小于预期数量（由Content-Length标头给出的）时，会引发此异常。 content属性存储下载的（和假设截断的）数据。
（遇到再加）

4.异常处理

先except HTTPError，再URLError。因为HTTPError可能会被URLError覆盖。
URLError有code、reason等属性，HTTPError有code、headers、reason属性

from urllib import request, error

url = "https://www.baidu.com/"
try:
    request.urlopen(url)
except error.HTTPError as e:
    print('HTTPError code: ',e.code)
    print('HTTPError reason', e.reason)
except error.URLError as e:
    print('URLError reason: ',e.reason)
else:
    print("OK!")

urllib之异常处理

1.URLError

1.1 产生原因

1.2 异常演示

1.2.1 超时错误：

1.2.2 连接不上

1.2.3 网站不存在

2.HTTPError

2.1 代码解释

2.2 异常演示

2.2.1 方法错误

3.ContentTooShortError

4.异常处理

猜你喜欢