在工作中可能会遇到去某某网站上抓取相应数据的需求,有2种简单的工具可以使用:httpclient和Jsoup。
依赖:
httpclient:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.1</version>
</dependency>
jsoup:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.8.3</version>
</dependency>
一、httpclient发送请求获取
自定义了一个工具类,具体用法如下:
public class HttpClientUtil {
/**
* get请求
*
* @return
*/
public static String doGet(String url) {
// 初始化一个httpclient
CloseableHttpClient httpClient = HttpClients.createDefault();
try {
// 发送get请求
HttpGet httpGet = new HttpGet(url);
CloseableHttpResponse response = httpClient.execute(httpGet);
/** 请求发送成功,并得到响应 **/
if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {
/** 读取服务器返回过来的json字符串数据 **/
// 4.处理结果,这里将结果返回为字符串
String result = null;
HttpEntity entity = response.getEntity();
if (entity != null) {
result = EntityUtils.toString(entity);
}
return result;
}
} catch (IOException e) {
e.printStackTrace();
System.out.println("调用get请求出错:" + url);
} finally {
try {
httpClient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
}
/**
* post请求(用于key-value格式的参数)
*
* @param url
* @param params
* @return
*/
public static String doPost(String url, Map<String, Object> params) {
BufferedReader in = null;
// 初始化一个httpclient
CloseableHttpClient httpClient = HttpClients.createDefault();
try {
// 实例化HTTP方法
HttpPost request = new HttpPost();
request.setURI(new URI(url));
// 设置参数
List<NameValuePair> nvps = new ArrayList<NameValuePair>();
for (Iterator<String> iterator = params.keySet().iterator(); iterator.hasNext();) {
String name = (String) iterator.next();
String value = String.valueOf(params.get(name));
nvps.add(new BasicNameValuePair(name, value));
}
request.setEntity(new UrlEncodedFormEntity(nvps, "UTF-8"));
CloseableHttpResponse response = httpClient.execute(request);
int code = response.getStatusLine().getStatusCode();
if (code == 200) { // 请求成功
in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "utf-8"));
StringBuffer sb = new StringBuffer("");
String line = "";
String separator = System.getProperty("line.separator");
while ((line = in.readLine()) != null) {
sb.append(line + separator);
}
in.close();
return sb.toString();
} else {// 请求失败
System.out.println("状态码:" + code);
return null;
}
} catch (Exception e) {
e.printStackTrace();
return null;
} finally {
try {
httpClient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
/**
* post请求(用于请求json格式的参数)
*
* @param url
* @param params
* @return
*/
public static String doPost(String url, String params) throws Exception {
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpPost httpPost = new HttpPost(url);// 创建httpPost
httpPost.setHeader("Accept", "application/json");
httpPost.setHeader("Content-Type", "application/json");
StringEntity entity = new StringEntity(params, "UTF-8");
httpPost.setEntity(entity);
CloseableHttpResponse response = null;
try {
response = httpclient.execute(httpPost);
StatusLine status = response.getStatusLine();
int state = status.getStatusCode();
if (state == HttpStatus.SC_OK) {
HttpEntity responseEntity = response.getEntity();
String jsonString = EntityUtils.toString(responseEntity);
return jsonString;
} else {
System.out.println("请求返回:" + state + "(" + url + ")");
}
} finally {
if (response != null) {
try {
response.close();
} catch (IOException e) {
e.printStackTrace();
}
}
try {
httpclient.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
}
}
使用时只需要调用相应的方法,传入参数即可得到String类型的返回值(json格式),可以通过JSON工具来获取需要的内容。如:
String url = "www.baidu.com";
String res = HttpClientUtil.doGet(url);
JSONObject data= JSONObject.parseObject(res);
String value = data.getString("value");
使用Jsoup获取页面内容
使用jsoup可以将整个页面获取过来,找到自己要拿的资源位置,jsoup可以根据页面上的标签或者id等等来获取里面的内容。
String url = "www.baidu.com";
Document doc = Jsoup.connect(url).get();
//根据id获取
Element e = doc.getElementById("id");
//根据标签获取
Elements tag = e.getElementsByTag("tagName");
//根据class获取
Elements classes = e.getElementsByClass("className");
//根据attribute获取
Elements attribute = e.getElementsByAttribute("key");
关于jsoup的其他用法,可以参考:
http://www.open-open.com/jsoup/