使用python爬取小说（附python源码） - 代码天地

使用python爬取小说（附python源码）

其他 2021-03-03 01:07:33 阅读次数: 0

import requests ###爬虫模块,获取网页文本
import re       ###正则表达式模块,从网页文本中提取所需要的信息
###### gettext(url):输入网站链接 url,返回该网站的文本
def gettext(url):
    r = requests.get(url,timeout=30)
    r.encoding = 'apparent_encoding'
    return r.text
###### 输入目录链接 url,返回各章节链接数组
def geturl(url):
    text=gettext(url)
    chapter_info_list=re.findall(r'<li><a href="(.*?)">',text)
    del(chapter_info_list[0])
    return chapter_info_list
###### 输入网站 url,返回该网站文本数组
def getline(url):
    text = gettext(url)
#print(text,file=open("序章.txt",'a',encoding='utf-8'))
    title=re.findall(r'<h1>(.*?)</h1>',text)
    line=re.findall(r'<span class="calibre[2-9]">(.*?)</span>',text)
    all = title+line
    return(all)
##### 输入数组,生成txt文件
def my_print(line,my_name):
    for i in line:
        print(i+'\n',file=my_name)
##### 主函数
def main():
    my_file=open("龙族.txt",'x',encoding='utf-8')
    url='http://www.yuedu88.com/longzu1/'
    url_list=geturl(url)
    for i in url_list:
        line=getline(i)
        my_print(line,my_file)
main()

2021年2月23日12:39:57

猜你喜欢

转载自blog.csdn.net/Infinity_07/article/details/113982240

使用python爬取小说（附python源码）

使用python爬取小说

Python爬取小说

python 爬取小说

使用python3爬取小说

Python爬虫实战，requests+openpyxl模块，爬取小说数据并保存txt文档（附源码）

python爬取热门小说

python爬取小说并下载

Python BeautifulSoup爬取小说

python爬取小说详解

python之爬取小说

Python爬取小说实例

1)python 爬取小说

Python爬虫——爬取小说

python爬取起点小说

python爬虫进阶使用多线程爬取小说

python3爬虫-使用requests爬取起点小说

python爬取全书网小说

python爬取小说详解（一）

Python 爬取笔趣阁小说

python爬取网络小说

python爬取小说（四）代码优化

python爬取小说（三）数据存储

Python爬虫—爬取小说名著

Python爬取新笔趣阁小说

Python爬取网页所有小说

用python爬取小说章节内容

python爬取笔趣阁小说

用Python爬取某网站小说

用python爬取豆瓣小说

今日推荐

周排行

成为C++高手之宏与枚举

在CAD二次开发中使用进度条

Js插件ECharts，HighCharts学习网址整理

Celery提交任务出错(on windows.)

cephfs内核客户端性能追踪

thinkphp中PHPExcel用法

EntityFramework动态组合多排序字段

汇编语言（八）实验9 根据材料编程

安装ubuntu后必须做的事情（对我而言）

JS函数式编程

每日归档

更多

2024-10-22(0)

2024-10-21(0)

2024-10-20(0)

2024-10-19(0)

2024-10-18(0)

2024-10-17(0)

2024-10-16(0)

2024-10-15(0)

2024-10-14(0)

2024-10-13(0)