urllib.error
urllib.error简单介绍
"""
- urllib.error
- URLError产生的原因:
- 没网
- 服务器链接失败
- 不知道制定服务器
- 是OSError的子类
- 案例一
- HTTPError,是URLError的一个子类
- 案例二
- 两者区别:
- HTTPError是对应的HTTP请求的返回码错误,如果返回错误码是400以上的,则引发HTTPError
- URLError是对应的一般是网络出现问题,包括url问题
- 区别关系: OSError-URLError-HTTPError
"""
案例一
from urllib import request, error
if __name__ == '__main__':
url = 'http://www.bilililili.com/'
try:
req = request.Request(url)
rsp = request.urlopen( req )
html = rsp.read().decode()
print(html)
except error.URLError as e:
print("URLError: {0}".format(e.reason))
print("URLError: {0}".format(e))
except Exception as e:
print(e)
案例二
from urllib import request, error
if __name__ == '__main__':
url = 'http://www.gov.cn/ada'
try:
req = request.Request(url)
rsp = request.urlopen( req )
html = rsp.read().decode()
print(html)
except error.HTTPError as e:
print("HTTPError: {0}".format(e.reason))
print("HTTPError: {0}".format(e))
except Exception as e:
print(e)
User-Agent
User-Agent简单介绍
"""
- User-Agent
- User-Agent: 用户代理,简称UA,属于heads的一部分,服务器通过UA来办判断访问者身份
- 常见的UA值,使用的时候可以直接复制粘贴,也可以用浏览器访问的时候抓包
- 抓包方法F12点击Network然后涮新界面在里边name选一个然后Request Headers里的User-Agent
- 移动端
- safari iOS 4.33 – iPhone User-Agent:(Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5)
- safari iOS 4.33 – iPod Touch User-Agent:(Mozilla/5.0 (iPod; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5)
- Android N1 User-Agent: (Mozilla/5.0 (Linux; U; Android 2.3.7; en-us; Nexus One Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1)
- PC端
- safari 5.1 – MAC User-Agent:(Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50)
- safari 5.1 – Windows User-Agent:(Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50)
- 请看案例三
请使用括号内的!!!!
"""
案例三
from urllib import request, error
if __name__ == '__main__':
url = 'http://www.baidu.com'
try:
headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'
req = request.Request(url, headers=headers)
rsp = request.urlopen( req )
html = rsp.read().decode()
print(html)
except error.HTTPError as e:
print(e)
except error.URLError as e:
print(e)
except Exception as e:
print(e)
设置UA的方式
"""
- 设置UA可以通过两种方式
- heads
- add_header
"""
ProxyHandler代理使用
ProxyHandler代理简单介绍
"""
- ProxyHandler处理(代理服务器)
- 使用代理IP,是爬虫的常用手段
- 获取代理服务器的地址:
- www.xicidaili.com
- www.goubanjia.com
- 代理用来隐藏真实访问中,代理也不允许频繁访问某一个固定网站,所以,代理一定要很多很多
- 基本使用步骤:
- 设置代理地址
- 创建ProxyHandler
- 创建Opener
- 安装Opener
- 案例四
"""
案例四
from urllib import request, error
if __name__ == '__main__':
url = 'http://www.baidu.com'
proxy = {'http': '221.126.249.102:8080'}
proxy_handler = request.ProxyHandler(proxy)
opener = request.build_opener(proxy_handler)
request.install_opener(opener)
try:
rsp = request.urlopen(url)
html = rsp.read().decode()
print(html)
except error.URLError as e:
print(e)
except Exception as e:
print(e)