jsoup爬取网站信息之《庆余年》 - 代码天地

jsoup爬取网站信息之《庆余年》

其他 2021-03-25 15:56:15 阅读次数: 0

使用jsoup爬取了下某个小说网站中的《庆余年》信息，并将格式保存成了json格式到文本文件中。

具体执行的代码如下：

public static void main(String[] args) throws IOException {
		TestJsoup3 tj = new TestJsoup3();
		tj.test();
	}
	
	static String path = "http://www.xbiquge.la";
	public void test() throws IOException {
		String url = "/2/1690/";
		Document document = JsoupUtils.getRoot(path + url);
		Elements lists = document.select("#list");
		JSONArray arr = this.analysisList(lists.get(0));//解析所有章节
		String dpath = "D:\\study\\jsoup\\qynjson.js";
		FileUtils.writeLine(dpath, arr.toJSONString());
	}
	private JSONArray analysisList(Element list) throws IOException {
		Elements links = list.select("a[href]");
		JSONArray arr = new JSONArray();
		for(Element link : links) {
			try {
				Thread.sleep(10000);
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
			String url = link.attr("href");//每个章节对应的url，子路径，需要加上 Path
			String name = link.text();//每个章节的名称
			arr.add(this.analysisChapter(name, path + url));
		}
		return arr;
	}
	
	private JSONObject analysisChapter(String name, String url) throws IOException {
		Document document = JsoupUtils.getRoot(url);
		Elements contents = document.select("#content");
		Element content = contents.get(0);//章节内容
		String text = content.text().trim();
		text = text.replaceAll("\"", "");
		JSONObject json = new JSONObject();
		json.put("name", name);
		json.put("content", text);
		System.out.println(name);
		return json;
	}

代码中确实的其它jar包信息和工具类，见我的另一篇文章，链接如下：

jsoup爬取网站信息之《本草纲目》

猜你喜欢

转载自blog.csdn.net/u013276512/article/details/112647930

jsoup爬取网站信息之《庆余年》

jsoup爬取网站信息之《冰与火之歌》

Jsoup爬取简单信息

Python爬虫实例(一)——爬取某点小说网《庆余年》

Rust爬虫练手:爬取B站“庆余年2“相关视频链接

（java）Jsoup爬虫学习--获取智联招聘（老网站）的全国java职位信息，爬取10页

python之简单爬取一个网站信息

python 爬虫之爬取网站信息并保存到文件

JAVA 爬取新闻网站的数据，httpclient和jsoup。

利用Jsoup爬取网站的图片，保存到本地

爬取图片 jsoup

jsoup 爬取电影

jsoup爬取图片

Jsoup 爬取文章

使用Jsoup爬取互联网信息

spider-java (Jsoup) (媒体信息的爬取)

【Java爬虫】使用Jsoup爬取网页表格的分页信息

Jsoup实现爬取多个网页的多条固定信息

使用jsoup爬取网页信息，保存到txt中

Python3--爬取数据之911网站信息爬取

java 爬虫之使用jsoup爬取页面

Java使用Jsoup之爬取博客数据应用实例

优酷网站信息爬取

Python爬虫：爬取网站电影信息

Python练习【爬取银行网站信息】

爬取学校新闻网站信息

爬取网站招聘信息代码解析

Jsoup爬取CSDN博客

Jsoup爬取html数据

Jsoup-爬取实战

今日推荐

零基础入门鸿蒙开发 HarmonyOS NEXT星河版开发学习

豆包MarsCode帮我2小时完成Go语言系统从开发、测试到部署全流程最佳实践，云IDE迁移PHP企业级项目最佳实践

内幕！smardaten无代码平台全方位测评，这些细节你绝对想不到！

idea安装及激活配置流程---2024旗舰版(需激活码)

Elastic 创始人：热爱开源，希望合作 OSI 创建新许可证

工业互联网标识解析体系开放开源下载服务中心发布

IDEA取消自动选择光标所在行

828华为云征文 | 使用Flexus X实例搭建Dubbo-Admin服务

Programmer&AI—AI辅助编程学习指南

【Linux】虚拟机安装 openEuler 24.03 X86_64

o1 发布后 Sam Altman 最新访谈：AI 发展不仅没有放缓，而且我们对未来几年已经胜券在握

AI芯片国产化率100%！运营商最大单集群智算中心投产

周排行

【后端】 Spring Cloud 服务间调用

Git 学习教程

Salesforce集成(三). 获取数据02_获取Object和Field信息

Oracle执行计划的稳定（使用MANUAL类型的SQL PROFILE）

js跨域请求之jsonp原理和运用

ios -解决view遮挡按钮问题

【PAT天梯赛】L2-003 月饼（25 分)（贪心思想）

hive 存储格式的生产应用

【Python实践-6】将不规范的英文名字，变为首字母大写，其他小写的规范名字

容器学习点点滴滴（二）

每日归档

更多

2024-10-03(2)

2024-10-02(60)

2024-10-01(0)

2024-09-30(0)

2024-09-29(0)

2024-09-28(4)

2024-09-27(60)

2024-09-26(0)

2024-09-25(0)

2024-09-24(0)