版权声明:转载请注明出处 https://blog.csdn.net/qq799028706/article/details/89294584
前言
就是这个地址了http://2050.acmclub.cn/
用的是杭电的OJ,杭电的OJ查看不了本校排名,简单分析了下,发现爬起来挺方便的
思路
首先要模拟登陆,不然看不到
登陆之后直接访问排名的网址
"http://2050.acmclub.cn/contests/contest_ranklist.php?cid=3&page=1"
逻辑很简单,page=1,2,3,4…
一般有成绩的就在前100页了,所以循环访问一百次就行
代码
from pyquery import PyQuery as pq
import requests
s = requests.session()
def login(username, pwd, college):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/70.0.3538.77 Safari/537.36"
}
data = {
"username": username,
"userpass": pwd,
"login": "Sign In"
}
s.post('http://2050.acmclub.cn/userloginex.php?action=login&cid=3¬ice=0', data=data, headers=headers)
for i in range(1, 100):
url = "http://2050.acmclub.cn/contests/contest_ranklist.php?cid=3&page=" + str(i)
res = s.get(url)
res.encoding = 'gb2312'
doc = pq(res.text)
trs = doc('tr').items()
for tr in trs:
tds = pq(tr('td'))
rank = tds[0].text
temp_list = pq(tds[1]).html().replace("\n", "").split("<br/>")
if temp_list[0] == 'Team':
continue
name = temp_list[0]
school = temp_list[1]
if school == college:
ac = tds[2].text
print(f'{rank} {name} {school} {ac}')
def main():
# 输入账号,密码,学校三个参数
login("xxx", "xxx", "xxxx")
if __name__ == '__main__':
main()