python爬虫四

其他 2021-04-06 09:22:41 阅读次数: 0

爬取斗破苍穹小说全本

import requests
from bs4 import BeautifulSoup
import re
import time
import lxml
class Spider():
    headers = {'User-Agent': 'Mozilla/5.0 (Windows'
                             ' NT 10.0; Win64; x64) '
                             'AppleWebKit/537.36 (KHTM'
                             'L, like Gecko) Chrome/79'
                             '.0.3945.88 Safari/537.36'
                   }

    def __analyse(self,url):
        f = open('C:/Users/baishuai/Desktop/斗破苍穹.txt', 'a+')
        html = requests.get(url, headers=Spider.headers)


        if html.status_code == 200:
            contents=re.findall('<p>(.*?)</p>',html.content.decode('utf-8'),re.S)

            for content in contents:
                print (content)
                f.write(content+'\n')
        else:
            pass
        f.close()


    def __urll(self):
        urls=['http://www.doupoxs.com/doupocangqiong/{}.html'.format(str(i)) for i in range(1562,1666)]
        for url in urls:
            self.__analyse(url)
            time.sleep(1)


    def go(self):
        self.__urll()


spider=Spider()
spider.go()

猜你喜欢

转载自blog.csdn.net/weixin_45955630/article/details/103759456

Python爬虫学习（四）

python爬虫系列（四）

Python 爬虫 (四)

python网络爬虫四

python 爬虫实例（四）

Python——爬虫（四）

Python爬虫——案例（四）

python爬虫四

python爬虫学习(四)

Python爬虫（四）

爬虫四——Python爬虫—简单引擎

python爬虫笔记（四）:BeautifulSoup

Python 爬虫（四）：Selenium 框架

初探python爬虫（四）——xpath

Python爬虫学习笔记（四）

四 Python爬虫之selenium

Python之爬虫（二十四）爬虫与反爬虫大战

Python 爬虫开发杂记之（四）Python爬虫学习章节

Python简单爬虫第四蛋！

Python网络爬虫学习笔记（四）

python 爬虫(四) Link Extractors 详解

Python实现网络爬虫基础学习（四）

Python 爬虫闯关（第四关）

python | 爬虫笔记（四）- 解析库使用

Python 爬虫闯关（第四关）-续

Python爬虫进阶四之PySpider的用法

python 爬虫（四）抓取Ajax数据

python爬虫第四天

python的爬虫（四）（适合新手）

Python爬虫系列：四、Cookie的使用

今日推荐

周排行

LRU cache算法

windows10, 自带的OpenSSH, key权限问题, 文件权限问题

测试用例书写方法

HIVE-默认分隔符的（linux系统的特殊字符）查看，输入和修改

最贵的AMD 7nm显卡来了！这设计够狂野

java多线程简单demo

[ 转载 ]在Android系统上使用busybox——最简单的方法

QT connect学习

BFSIFT算法分析

Xcode10：library not found for -lstdc++.6.0.9 临时解决

每日归档

2024-08-06(0)

2024-08-05(0)

2024-08-04(0)

2024-08-03(0)

2024-08-02(0)

2024-08-01(0)

2024-07-31(0)

2024-07-30(0)

2024-07-29(0)

2024-07-28(0)