selenium无头浏览器

使用的是edge浏览器驱动

使用无头浏览器爬取每一年票房最高的电影

在这里插入图片描述

from selenium.webdriver import Edge
from selenium.webdriver.edge.options import Options
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
import time

# 设置为无头
opt = Options()
opt.add_argument("--headless")
opt.add_argument("--disbale-gpu")

# 把参数设置到浏览器中
web = Edge(executable_path=r'C:\Program Files (x86)\Microsoft\Edge\Application\msedgedriver.exe',options=opt)

web.get('https://www.endata.com.cn/BoxOffice/BO/Year/index.html')

# 处理下拉列表<select></select>
# 定位到下拉列表
sel_el = web.find_element(By.XPATH,r'//*[@id="OptionDate"]')
# 对元素进行包装，包装成下拉菜单
sel = Select(sel_el)
"""
下拉框：
<select>
    <option value="1">文本</option>
</select>
"""
# 让浏览器调整选项
for i in range(len(sel.options)): # len(sel.options)是下拉框的所有选项的长度，i就是每一个下拉框选项的索引位置
    sel.select_by_index(i) # 根据索引进行切换
    # sel.select_by_value() # 根据value进行切换
    # sel.select_by_visible_text() # 根据文本切换
    time.sleep(2)
    # web.find_element(By.XPATH,r'//*[@id="J-userName"]').send_keys("**********")
    # //*[@id="TableList"]/table/tbody/tr[1]/td[2]/a/p # 2022年
    # //*[@id="TableList"]/table/tbody/tr[2]/td[2]/a/p # 2021年
    name = web.find_element(By.XPATH, r'//*[@id="TableList"]/table/tbody/tr[1]/td[2]/a/p').text
    print(name)
    print("==============")

print("over")

# 拿到页面代码Elements(经过数据加载及以及js执行之后的结果的html内容)
print(web.page_source)

在这里插入图片描述

selenium无头浏览器

selenium无头浏览器

使用无头浏览器爬取每一年票房最高的电影

猜你喜欢