由于有之前发现很多国外的爬虫会抓页面的经验,为了减小压力直接修改 robots.txt :
# xiaowu
User-agent: Baiduspider
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: Sosospider
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: sogou spider
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: Googlebot
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: Bingbot
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: MSNBot
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: googlebot-mobile
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: 360Spider
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: HaosouSpider
Allow: /
Disallow: /admin/
Disallow: /*.php$
User-agent: *
Disallow: /