java HTML 解析

一 Node 与 Element的区别

引用:https://blog.csdn.net/kkkkkxiaofei/article/details/52608394

DOM 简称文档对象模型,是html和xml编程的接口,它将文档编程为树结构。
而Dom解析过程中,长用到node与Element,这两个元素的差别描述如下:

A Node is an interface from which a number of DOM types inherit, and allows these various types to be treated (or tested) similarly.

The following interfaces all inherit from Node its methods and properties: Document, Element, CharacterData (which Text, Comment, and CDATASection inherit), ProcessingInstruction, DocumentFragment, DocumentType, Notation, Entity, EntityReference.

也就是说,node是基类,DOM中的Element,Text和Comment都继承于它。 Node表示的是DOM树的结构,而ELement是一种特殊类型的node。
例如:


<body>
we can put text here 1...
<h1>China</h1>
we can put text here 2...
<!-- My comment ... -->
we can put text here 3...
<p>China is a popular country with...</p>
we can put text here 4...
<div>
<button>See details</button>
</div>
we can put text here 5 ...
</body>

body下面的元素(Element)有3个。
body下面的节点:直系元素(3)+ COMMENT_NODE(1) + TEXT_NODE(5) = 9

二 java HTML解析工具Jsoup

Document继承Element继承Node. TextNode继承 Node.
例如下面:

<li class=\"title includes\">费用包含:</li>

Element是整个部分。
这个Element 里面有一个TextNode:费用包含:

猜你喜欢

转载自blog.csdn.net/secure2/article/details/81097810