基础篇(1) selenium及PhantomJS应用

1、selenium

Selenium是一个Web的自动化测试工具，最初是为网站自动化测试而开发的，Selenium 可以直接运行在浏览器上，它支持所有主流的浏览器（包括PhantomJS这些无界面的浏览器），可以接收指令，让浏览器自动加载页面，获取需要的数据，甚至页面截屏
在python中使用selenium模块首先pip安装：pip install selenium

官方学习地址：http://selenium-python-zh.readthedocs.io/en/latest/waits.html

1.1、py中selenium基本用法

#导入模块
from selenium import webdriver
options = webdriver.ChromeOptions()
# headless 无头设置
options.add_argument('--headless') 
options.add_argument('--disable-gpu')  # 规避bug
options.add_argument('test-type')  #
options.add_argument('disable-infobars')  # 隐藏chrome正在受到自动软件的
# options.add_argument("--proxy-server=http://223.166.247.206:9000")
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome(chrome_options=self.options)
#设置窗口最大化
driver.maximize_window()

#打开一个窗口访问网页
driver.get("http://www.baidu.com")
# 找到登录输入框
driver.find_element_by_id("login_username").send_keys(self.username)
time.sleep(1)
# 找到密码输入框
driver.find_element_by_name("password").send_keys(self.passwd)
time.sleep(1)
#刷新页面
#driver.refresh()
driver.find_element_by_xpath("//button[text()='登录']").click()
time.sleep(10)

#获取网页源代码
html_str = driver.page_source
#获取当前页面url地址
driver.current_url
#截屏保存一张图片
driver.save_screenshot("cn.png")


#获取cookies并遍历
for cookie in driver.get_cookies():
    print(cookie)
#删除某个cookie
driver.delete_cookie("CookieName")
#删除所有cookie
driver.delete_all_cookies()

#退出浏览器
driver.quit()

1.2、selenium页面中查找元素

selenium页面查找元素常用方法

find_element_by_id (根据id查询，返回一个)
find_elements_by_xpath （通过xpath表达式,返回一个列表）
find_elements_by_link_text (全部文本查找)
find_elements_by_partial_link_text (包含某个文本查找)
find_elements_by_tag_name (根据标签名查询，返回一个)
find_elements_by_class_name
find_elements_by_css_selector

driver.find_element_by_id('kw')
driver.find_element_by_name('tj_trnews')
#如<a class="xxx" href="http://www.a.com">新闻</a>
find_element_by_link_text("新闻")
find_element_by_xpath("//div[@name=’q’]/form/span/input")#通过上三级目录的name 属性定位

使用过程中需要注意两点：

（1）find_element 和find_elements的区别：返回一个和返回一个列表；

（2）by_xpath中获取属性和文本需要使用get_attribute() 和.text。

1.3、鼠标动作链（了解）

#
#导入 ActionChains 类
from selenium.webdriver import ActionChains

# 鼠标移动到 ac 位置
ac = driver.find_element_by_xpath('element')
ActionChains(driver).move_to_element(ac).perform()


# 在 ac 位置单击
ac = driver.find_element_by_xpath("elementA")
ActionChains(driver).move_to_element(ac).click(ac).perform()

# 在 ac 位置双击
ac = driver.find_element_by_xpath("elementB")
ActionChains(driver).move_to_element(ac).double_click(ac).perform()

# 在 ac 位置右击
ac = driver.find_element_by_xpath("elementC")
ActionChains(driver).move_to_element(ac).context_click(ac).perform()

# 在 ac 位置左键单击hold住
ac = driver.find_element_by_xpath('elementF')
ActionChains(driver).move_to_element(ac).click_and_hold(ac).perform()

# 将 ac1 拖拽到 ac2 位置
ac1 = driver.find_element_by_xpath('elementD')
ac2 = driver.find_element_by_xpath('elementE')
ActionChains(driver).drag_and_drop(ac1, ac2).perform()

1.4、下拉选择select

# 导入 Select 类
from selenium.webdriver.support.ui import Select

# 找到 name 的选项卡
select = Select(driver.find_element_by_name('status'))

# 
select.select_by_index(1)
select.select_by_value("0")
select.select_by_visible_text(u"未审核")
以上是三种选择下拉框的方式，它可以根据索引来选择，可以根据值来选择，可以根据文字来选择。注意：
（1）index索引从0开始
（2）value是option标签的一个属性值，并不是显示在下拉框中的值
（3）visible_text实在option标签文本的值，是显示在下拉框的值

全部取消 select.deselect_all()

1.5、弹窗、页面切换、前进后退

【selenium中弹框处理】
当你触发了某个事件之后，页面出现了弹窗提示，处理这个提示或者获取提示信息方法如下
alert = driver.switch_to_alert()

【窗口切换】
打开一下新的页面driver.execute_script("window.open('"+url+"')")
切换窗口方法一：
window_handles = driver.window_handles
driver.switch_to_window(window_handles[-1])
方法二：
driver.switch_to.window(driver.window_handles[1])

注意switch_to_window()方法已经过时

【执行js】

self.browser.execute_script(js)
【获取网页源代码】
#获取网页源代码
html_str = driver.page_source

【获取网页的cookies】

     cookies = {}
            for cookie in self.browser.get_cookies():
                cookies[cookie["name"]]=cookie["value"]
            print(cookies)
            print("获取cookies成功")
            return (username,json.dumps(cookies))

【获取网页中的token】

self.browser.execute_script('return localStorage.getItem("token")')

2、PhantomJS

2.1、PhantomJS安装

#相关依赖
[root@wzy_woyun soft]# yum install -y fontconfig
# 下载好后进行解压（由于是bz2格式，要先进行bzip2解压成tar格式，再使用tar解压）
[root@wzy_woyun soft]# bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2
[root@wzy_woyun soft]# tar xvf phantomjs-2.1.1-linux-x86_64.tar -C /usr/local/
#重命名（方便以后使用phantomjs命令）
[root@wzy_woyun soft]# mv /usr/local/phantomjs-2.1.1-linux-x86_64/ /usr/local/phantomjs
# 最后一步就是建立软连接了（在/usr/bin/目录下生产一个phantomjs的软连接，也就不需要配置环境变量$PATH）
[root@wzy_woyun soft]# ln -s /usr/local/phantomjs/bin/phantomjs /usr/bin/
#测试
[root@wzy_woyun soft]# phantomjs --version

2.2、phantom配置

from selenium import webdriver
# 引入配置对象DesiredCapabilities
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any']
dcap = dict(DesiredCapabilities.PHANTOMJS)
# 设置代理
#service_args = ['--proxy=127.0.0.1:9999','--proxy-type=socks5']
dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/4.0 (compatible; MSIE 5.5; windows NT)"  )
phantomjs_driver_path = ""
#打开带配置信息的phantomJS浏览器
#driver = webdriver.PhantomJS(phantomjs_driver_path, desired_capabilities=dcap,service_args=service_args)
driver = webdriver.PhantomJS(desired_capabilities=dcap,service_args=service_args)
# 隐式等待5秒，可以自己调节
driver.implicitly_wait(5)
# 设置10秒页面超时返回，类似于requests.get()的timeout选项，driver.get()没有timeout选项
# 以前遇到过driver.get(url)一直不返回，但也不报错的问题，这时程序会卡住，设置超时选项能解决这个问题。
driver.set_page_load_timeout(10)
# 设置10秒脚本超时时间
driver.set_script_timeout(10)