- 创建一个scrapy项目,在终端输入如下命令后用pycharm打开桌面生成的zhilian项目
cd Desktop
scrapy startproject zhilian
cd zhilian
scrapy genspider Zhilian sou.zhilian.com
- middlewares.py里添加如下代码:
from scrapy.http.response.html import HtmlResponse
class PhantomjsMiddleware(object):
def process_request(self,request,spider):
if spider.name == 'Zhilian':
spider.driver.get(request.url)
spider.driver.implicitly_wait(10)response = HtmlResponse(url=spider.driver.current_url,
request=request,
body=spider.driver.page_source,
encoding='utf-8'
)
return response
- settings.py里添加如下代码:
DOWNLOADER_MIDDLEWARES = {
# 'zhilian.middlewares.ZhilianDownloaderMiddleware': 543,
'zhilian.middlewares.PhantomjsMiddleware': 1,
}
- zhilian.py里添加如下代码:
from selenium import webdriver
def __init__(self):
self.driver =webdriver.PhantomJS() # 在ZhilianSpider这个类中添加这个方法