版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/chengqiuming/article/details/86413480
一 子节点
1 find方法
1.1 代码
html = '''
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
print(type(items))
print(items)
# find()方法,此时传入的参数是CSS选择器
lis = items.find('li')
print(type(lis))
print(lis)
1.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<class 'pyquery.pyquery.PyQuery'>
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
<class 'pyquery.pyquery.PyQuery'>
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
1.3 说明
首先,我们选取class为list的节点,然后调用了find()方法,传入CSS选择器,选取其内部的li节点,最后打印输出。可以发现,find()方法会将符合条件的所有节点选择出来,结果的类型是PyQuery类型。
其实find()的查找范围是节点的所有子孙节点,而如果我们只想查找子节点,那么可以用children()方法
2 children方法
2.1 代码
html = '''
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
# 只想查找子节点,那么可以用children()方法
lis = items.children()
print(type(lis))
print(lis)
2.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<class 'pyquery.pyquery.PyQuery'>
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
3 筛选器
3.1 代码
html = '''
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
# 可以向children()方法传入CSS选择器.active:
lis = items.children('.active')
print(lis)
3.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
3.3 说明
输出结果已经做了筛选,留下了class为active的节点。
二 父节点
1 parent反复
1.1 代码
html = '''
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
# 用parent()方法来获取某个节点的父节点
container = items.parent()
print(type(container))
print(container)
1.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<class 'pyquery.pyquery.PyQuery'>
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
1.3 说明
这里我们首先用.list选取class为list的节点,然后调用parent()方法得到其父节点,其类型依然是PyQuery类型。
这里的父节点是该节点的直接父节点,也就是说,它不会再去查找父节点的父节点,即祖先节点。
但是如果想获取某个祖先节点,该怎么办呢?这时可以用parents()方法
2 parents方法
2.1 代码
html = '''
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
parents = items.parents()
print(type(parents))
print(parents)
2.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<class 'pyquery.pyquery.PyQuery'>
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div><div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
2.3 说明
可以看到,输出结果有两个:一个是class为wrap的节点,一个是id为container的节点。也就是说,parents()方法会返回所有的祖先节点。
3 筛选器
3.1 代码
html = '''
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
items = doc('.list')
# parents()方法传入CSS选择器,这样就会返回祖先节点中符合CSS选择器的节点
parent = items.parents('.wrap')
print(parent)
3.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div>
三 兄弟节点
1 siblings
1.1 代码
html = '''
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
# 这里首先选择class为list的节点内部class为item-0和active的节点,也就是第三个li节点。
# 那么,很明显,它的兄弟节点有4个,那就是第一、二、四、五个li节点。
li = doc('.list .item-0.active')
print(li.siblings())
1.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0">first item</li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
1.3 说明
可以看到,这正是我们刚才所说的4个兄弟节点。
2 筛选器
2.1 代码
html = '''
<div class="wrap">
<div id="container">
<ul class="list">
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
</ul>
</div>
</div>
'''
from pyquery import PyQuery as pq
doc = pq(html)
# 这里首先选择class为list的节点内部class为item-0和active的节点,也就是第三个li节点。
# 那么,很明显,它的兄弟节点有4个,那就是第一、二、四、五个li节点。
li = doc('.list .item-0.active')
# 如果要筛选某个兄弟节点,我们依然可以向siblings方法传入CSS选择器,
# 这样就会从所有兄弟节点中挑选出符合条件的节点了
print(li.siblings('.active'))
2.2 结果
E:\WebSpider\venv\Scripts\python.exe E:/WebSpider/4_3.py
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
2.3 说明
我们筛选了class为active的节点,通过刚才的结果可以观察到,class为active的兄弟节点只有第四个li节点,所以结果应该是一个。