User Agent与代理IP

 原文https://blog.csdn.net/c406495762/article/details/60137956,原文更详细

1、使用User Agent与代理IP都是为了防止程序被网站识别为爬虫

2、User Agent存放于Headers中,常见的Agent

1.Android

  • Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Safari/535.19
  • Mozilla/5.0 (Linux; U; Android 4.0.4; en-gb; GT-I9300 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
  • Mozilla/5.0 (Linux; U; Android 2.2; en-gb; GT-P1000 Build/FROYO) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1

2.Firefox

  • Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
  • Mozilla/5.0 (Android; Mobile; rv:14.0) Gecko/14.0 Firefox/14.0

3.Google Chrome

  • Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36
  • Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19

4.iOS

  • Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3
  • Mozilla/5.0 (iPod; U; CPU like Mac OS X; en) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/3A101a Safari/419.3

3、以CSDN为例,不设置Agent,访问出错,设置Agent以后,访问正常

def urlWithOutSettingAgent():
    url = 'http://www.csdn.net/'
    req = request.Request(url)
    response = request.urlopen(req)
    html = response.read().decode('utf-8')
    print(html)
def urlWithSettingAgent():
    url = 'http://www.csdn.net/'
    head = {}
    head['User-Agent'] = 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) ' \
                         'AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166  ' \
                         'Safari/535.19'
    req = request.Request(url, headers=head)
    response = request.urlopen(req)
    html = response.read().decode('utf-8')
    print(html)

还有一种添加Agent方法

def urlWithSettingAgent2():
    url = 'http://www.csdn.net/'
    req = request.Request(url)
    req.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) '
                                 'AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 '
                                 ' Safari/535.19')
    response = request.urlopen(req)
    html = response.read().decode('utf-8')
    print(html)

4、IP代理

查看当前访问网站时自己的ip:http://myip.kkcha.com/

在这个网站上边找一个代理http://www.xicidaili.com/wt/能ping通的,然后从打印的页面可以看到,已经使用了代理的IP

def proxyIp():
    url = 'http://myip.kkcha.com/'
    proxy = {'http':'222.221.11.119:3128'}
    proxy_support = request.ProxyHandler(proxy)
    opener = request.build_opener(proxy_support)
    opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36')]
    request.install_opener(opener)
    response = request.urlopen(url)
    html = response.read().decode('utf-8')
    print(html)

猜你喜欢

转载自blog.csdn.net/csdn86868686888/article/details/82108697