题记
正常的HTML转PDF,往上一找一大把,我这里遇到的是一堆问题与条件合在一起。
- 访问一个接口获取HTML并且转PDF
- 不能用插件,服务器不能装
- HTML不符合XML标准,走XML解析(render)的方式根本不行
- HTML中有图片,转PDF后无法显示
- 更过分的是HTML中实际不是图片,是有登录拦截的JSP
- 字体格式转出来比较有问题(目前未解决)
- HTML中有JS对标签做操作,影响展示
整个文章是我处理一个个问题的思路与细节,串起来的代码可以看最后的gitee地址
1.先将HTML和内部的JS下载到本地
先访问主页面获取cookie
RestTemplate restTemplate = new RestTemplate();
String url = "http://ip:port/easp/easPrint?boeHeaderId=35687735&type=azBoe";
ResponseEntity<String> entity = restTemplate.getForEntity(url, String.class);
List<String> cookies = entity.getHeaders().get(SET_COOKIE);
String cookie = cookies.get(0);
解析HTML,拿到其中所有img标签的属性值
用到jsoup,这个很强大,还可以修改html的值
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.13.1</version>
</dependency>
通过下面的方法可以拿到每个img标签的各种属性值
Document document = Jsoup.connect(url).get();
Elements elementsByClass = document.getElementsByTag("img");
for (Element byClass : elementsByClass) {
String image = byClass.attr("src");
String id = byClass.attr("id");
System.out.println("图片路径::"+image);
System.out.println("id::" +id);
String allImage = header + image;
String realPath = path+ "/" + i +".jsp";
download(allImage, realPath,cookie);
replaceTxtByStr(filePath,image,file2Base64(realPath));
i ++;
}
将远程文件下载到本地
这个没啥好特殊说的,注意带上前面拿到的cookie就可以
// 将文件下载到本地
private void download(String httpUrl, String fileName,String cookie) throws Exception {
// 解决url中可能有中文情况
URL url = new URL(httpUrl);
HttpURLConnection http = (HttpURLConnection)url.openConnection();
http.setConnectTimeout(3000);
// 设置 User-Agent 避免被拦截
http.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9" );
http.setRequestProperty("Accept-Encoding","gzip, deflate" );
http.setRequestProperty("Accept-Language","zh-CN,zh;q=0.9" );
http.setRequestProperty("Cache-Control","max-age=0" );
http.setRequestProperty("Connection","keep-alive" );
http.setRequestProperty("Host","10.250.34.61:8000" );
http.setRequestProperty("Upgrade-Insecure-Requests","1" );
http.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36" );
http.setRequestProperty("Cookie",cookie);
InputStream inputStream = http.getInputStream();
http.connect();
http.getResponseCode();
byte[] buff = new byte[1024*10];
File file = new File(fileName);
System.out.println(file);
if(!file.exists()){
OutputStream out = new FileOutputStream(file);
int len ;
int count = 0; // 计数
while((len = inputStream.read(buff)) != -1) {
String line = new String(buff);
out.write(buff, 0, len);
out.flush();
++count ;
}
// 关闭资源
out.close();
inputStream.close();
http.disconnect();
}
}
2.对HTML文件做内部处理
写好一个通用的替换文件内容的方法
虽然jsoup能修改html内容,但是改完就会覆盖掉JS代码,所以只能另辟蹊径
/**
* 替换文件中的字符串
*
* @param filePath
* @param oldStr
* @param replaceStr
*/
public void replaceTxtByStr(String filePath, String oldStr, String replaceStr) {
int len = oldStr.length();
StringBuffer tempBuf = new StringBuffer();
try {
File file = new File(filePath);
FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis);
BufferedReader br = new BufferedReader(isr);
StringBuffer buf = new StringBuffer();
// 替换所有匹配的字符串
for (String temp = null; (temp = br.readLine()) != null; temp = null) {
if (temp.indexOf(oldStr) != -1) {
temp = temp.replace(oldStr, replaceStr);
}
buf.append(temp);
buf.append(System.getProperty("line.separator"));
}
br.close();
FileOutputStream fos = new FileOutputStream(file);
PrintWriter pw = new PrintWriter(fos);
pw.write(buf.toString().toCharArray());
pw.flush();
pw.close();
} catch (IOException e) {
e.printStackTrace();
}
}
图片转base64
只有把base64放到html中才能跟着转成PDF,并且注意细节,后台和HTML对于base64的标准有区别,抬头不一样
private String file2Base64(String filePath){
byte[] data ;
try {
FileInputStream inputStream = new FileInputStream(filePath);
data = new byte[inputStream.available()];
inputStream.read(data);
inputStream.close();
return "data:image/png;base64," + (Base64.encodeBase64String(data));
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
将html中src的路径改为base64
这里面的细节,要先下载HTML,才能处理HTML。先下载JSP文件,才能转base64
private String analysis(String path,String url,String cookie) throws Exception {
// HTML的目录
String filePath = path + "/orderPage.html";
String key = "/easp/";
Document document = Jsoup.connect(url).get();
Elements elementsByClass = document.getElementsByTag("img");
download(url,filePath,cookie);
//todo 这里这句话是为了去掉JS中对HTML操作的那句话
replaceTxtByStr(filePath,"document.getElementById("barCodeId").src=strs[0]+"barcode.jsp?billCode="+boeNum;","");
String[] arr = url.split(key);
String header = arr[0] + key;
//遍历以上列表
int i = 0;
for (Element byClass : elementsByClass) {
String image = byClass.attr("src");
String id = byClass.attr("id");
System.out.println("图片路径::"+image);
System.out.println("id::" +id);
String allImage = header + image;
String realPath = path+ "/" + i +".jsp";
download(allImage, realPath,cookie);
replaceTxtByStr(filePath,image,file2Base64(realPath));
i ++;
}
return filePath;
}
3.转PDF
直接调用转的方法就好了
一个细节,字体库目前是windows,但是服务在linux上运行,这里需要处理
public class HtmlToPdf {
public void htmlToPdf() throws Exception {
String path ="D:\workspace\demo\src\main\resources\1656666705064\orderPage.html";
String destPath = "D:\workspace\demo\src\main\resources\1656666705064\template.pdf";
ConverterProperties converterProperties = new ConverterProperties();
FontProvider dfp = new DefaultFontProvider();
// //添加字体库
dfp.addDirectory("C:/Windows/Fonts");
converterProperties.setFontProvider(dfp);
try (InputStream in = new FileInputStream((path)); OutputStream out = new FileOutputStream((destPath))){
HtmlConverter.convertToPdf(in, out, converterProperties);
}catch (Exception e){
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
new HtmlToPdf().htmlToPdf();
}
}