1、创建Scrapy项目
scrapy startproject renren2Spider
2.进入项目目录,使用命令genspider创建Spider
scrapy genspider renren2 "renren.com"
3、编写提取item数据的Spider(在spiders文件夹下:renren2.py)
# -*- coding: utf-8 -*-
# scrapy发送POS请求--人人网登录,标准的模拟登陆步骤
import scrapy
class Renren2Spider(scrapy.Spider):
name = 'renren2'
allowed_domains = ['renren.com']
start_urls = ['http://www.renren.com/PLogin.do']
username = input("请输入账号:")
password = input("请输入密码:")
def parse(self, response):
# 发送请求参数,并调用指定回调函数处理
# FormRequest.from_response是Scrapy提供的一个函数, 用于处理post表单
yield scrapy.FormRequest.from_response(
response,
formdata={"email":self.username,"password":self.password},
callback = self.parse_newpage
)
# 处理响应内容
def parse_newpage(self, response):
with open('renren2.html','w',encoding='utf-8')as f:
f.write(response.body.decode('utf-8'))
4、配置settings文件(settings.py)
# 人人网不用配置即可,有些网站可能需要修改以下参数为False
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
5.以上设置完毕,进行爬取:执行项目命令crawl,启动Spider:
scrapy crawl renren2