https://github.com/Lixianshengchao/phanbedder.gitselenium+phantomjs集成到项目中做到无界面模拟用户行为,请参考 https://github.com/Lixianshengchao/phanbedder.git
https://github.com/Lixianshengchao/phanbedder.git
@Test public void keyWorkGenerate() throws IOException, InterruptedException { File phantomjs = Phanbedder.unpack(); DesiredCapabilities dcaps = new DesiredCapabilities(); dcaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, phantomjs.getAbsolutePath()); PhantomJSDriver driver = new PhantomJSDriver(dcaps); //Usual Selenium stuff... driver.get("https://www.taobao.com/"); JavascriptExecutor js = (JavascriptExecutor) driver; String keyword = "奶粉"; Object object = js.executeScript("arguments[0].value=\"" + keyword + "\"", driver.findElement(By.id("q"))); String value = driver.findElementById("q").getAttribute("value"); System.out.println(value); js.executeScript("arguments[0].click()", driver.findElementByClassName("btn-search")); Thread.currentThread().sleep(3000); String newURL = driver.getCurrentUrl(); System.out.println(newURL); WebElement webElement = driver.findElementById("J_NavCommonRowItems_0"); List<WebElement> childrenElements = webElement.findElements(By.tagName("a")); for (WebElement element : childrenElements) { String title = element.getAttribute("title").trim(); System.out.println(title); if (title.equals("Synutra/圣元")) { driver.executeScript("arguments[0].click()", element); Thread.currentThread().sleep(3000); System.out.println(driver.getCurrentUrl()); } } }抓取效果图
用户行为
①在搜索框中输入奶粉,并点击搜索。
②根据品牌分类信息选中 Synutra/圣元
③筛选后的商品信息再用爬虫框架抓取。(略)