urllib库
urllib.request:请求模块
urllib.parse:url解析模块
urllib.error:异常处理模块
1.urllib.request:请求模块
This example gets the youdao main page and displays the first 300 bytes of it.
In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.
1 import urllib.request #导入请求模块 2 3 with urllib.request.urlopen("http://www.youdao.com/w/bytes/#keyfrom=dict2.top") as fd: 4 print(fd.read().decode("utf-8")) #编码格式
2.urllib.parse:url解析模块
1 def urlparse(url, scheme='', allow_fragments=True): 2 """Parse a URL into 6 components: 3 <scheme>://<netloc>/<path>;<params>?<query>#<fragment> 4 Return a 6-tuple: (scheme, netloc, path, params, query, fragment). 5 Note that we don't break the components up in smaller bits 6 (e.g. netloc is a single string) and we don't expand % escapes.""" 7 url, scheme, _coerce_result = _coerce_args(url, scheme) 8 splitresult = urlsplit(url, scheme, allow_fragments) 9 scheme, netloc, url, query, fragment = splitresult 10 if scheme in uses_params and ';' in url: 11 url, params = _splitparams(url) 12 else: 13 params = '' 14 result = ParseResult(scheme, netloc, url, params, query, fragment) 15 return _coerce_result(result)
import urllib.parse #导入url解析模块 url = "http://www.youdao.com/w/bytes/#keyfrom=dict2.top" result = urllib.parse.urlparse(url=url) #识别与分段 print(result) 参数说明: scheme:表示协议 netloc:域名 path:路径 params:参数 query:查询条件,一般都是get请求的url fragment:锚点,用于直接定位页面的下拉位置,跳转到网页的指定位置
3.urllib.error:异常处理模块
1 import urllib.request #导入请求模块 2 import urllib.error #导入错误信息模块 3 4 def check_urlerror(): 5 try: 6 with urllib.request.urlopen("http://www.youdaoxxx.com/w/bytes/#keyfrom=dict2.top") as fd: 7 print(fd.read().decode("utf-8")) 8 except urllib.error.URLError as err: 9 print(err.reason) 10 11 check_urlerror()