Python数据挖掘学习笔记（7）自动模拟HTTP请求

客户端若要与服务器端进行通信，需要通过http请求进行，http请求有很多种，本文会涉及post与get两种请求方式。当进行网页信息提交操作如登录时会使用Post请求、当在网页进行信息检索时会使用Get请求。

一、Post请求：

首先找到一个具有登录界面的网页，本文使用了http://www.iqianyue.com/mypost这一网址：

观察这一网址的源代码：

<html>
<head><title>Post Test Page</title></head>
<body>
    <form action="" method="post">
    name:<input name="name" type="text" /><br>
    passwd:<input name="pass" type="text" /><br>
    <input name="" type="submit" value="submit" /><br />
</body>
</html>

注意到输入框input标签的特点是“name=XXX”，当在两个输入框输入内容点击"submit"按钮时，网页会显示输入信息：

本程序就是通过代码模拟这一提交过程，代码如下：

import urllib.request
import urllib.parse
#当Post网址无法轻易找到时，可以使用抓包分析
url="http://www.iqianyue.com/mypost/"
#设定Post的值
mydata=urllib.parse.urlencode({
"name":"窗前明月光",
"pass":"12332sas"
    }).encode("utf-8")
req=urllib.request.Request(url,mydata)
#伪装成浏览器:req.add_header
data=urllib.request.urlopen(req).read()
fh=open("F:/3.html","wb")
fh.write(data)
fh.close()

结果：在相关路径可找到保存的本地网页。

二、Get请求：
首先在百度搜索网页中输入一个搜索词，观察URL内容：

注意到在百度的原始网址后是“/s?”，然后是"xxx=xxx"，每一个之间用“&”分割，特别是搜索关键词，使用的是“wd=Python”，本例将模拟这一搜索过程。

import urllib.request
keywd="Python"
url="http://www.baidu.com/s?wd="+keywd  #注意为HTTP而不是HTTPS
req=urllib.request.Request(url)
data=urllib.request.urlopen(req).read()
fh=open("F:/222.html","wb")
fh.write(data)
fh.close()

结果：可看到相应目录保存的本地搜索结果网页：

、

注意：若搜索词为中文，则需要对其进行编码，代码如下：

#当搜索词为中文时
import urllib.request
keywd="床前明月光"
keywd=urllib.request.quote(keywd)#中文编码
url="http://www.baidu.com/s?wd="+keywd
req=urllib.request.Request(url)
data=urllib.request.urlopen(req).read()
fh=open("F:/222.html","wb")
fh.write(data)
fh.close()

Python数据挖掘学习笔记（7）自动模拟HTTP请求

感谢韦玮老师的指导

猜你喜欢