版权声明:博主原创,未经允许不得转载。 https://blog.csdn.net/qq_31748587/article/details/84655332
正经学徒,佛系记录,不搞事情
基于上一个项目:https://blog.csdn.net/qq_31748587/article/details/84550356
新增的提供的方法:
- 图片转pdf
- 读取pdf文本
- pdf转图片
- 批量pdf合成一份pdf并生成目录(即书签)
图片转pdf
/**
* @description 图片转pdf
* @param imgList List<byte[]>
* @return out FileOutputStream
*/
public static FileOutputStream imgsToPdf(List<String> imgList) {
FileOutputStream out = null;
try {
out = new FileOutputStream(PDF_PATH);
Document document = new Document();
PdfWriter.getInstance(document, out);
document.open();
for (String img : imgList) {
document.newPage();
Image image = Image.getInstance(img);
//获取图片宽
float imageWidth = image.getWidth();
//获取图片高
float imageHeight = image.getHeight();
// 获取页面宽度
float pageWidth = document.getPageSize().getWidth();
// 获取页面高度
float pageHeight = document.getPageSize().getHeight();
//控制缩放程度,当图片超过页面的时候宽度设置为页面宽度
if(imageWidth>pageWidth){
imageHeight = pageHeight*pageWidth/imageWidth;
imageWidth = pageWidth;
}
//减去74的留白
image.scaleToFit(imageWidth-74, imageHeight);
document.add(image);
}
document.close();
out.flush();
out.close();
} catch (DocumentException e) {
e.printStackTrace();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return out;
}
测试方法
List<String> imgList=new ArrayList<>();
imgList.add("D:\\pdf\\红彤彤.png");
imgList.add("D:\\pdf\\绿油油.png");
PDFUtil.imgsToPdf(imgList);
读取pdf文本
/**
* @description 读取pdf
* @param headReader PdfReader
* @return result
*/
public static StringBuffer readPdfContent(PdfReader headReader) {
StringBuffer result = new StringBuffer();
try {
PdfReaderContentParser parser = new PdfReaderContentParser(headReader);
TextExtractionStrategy strategy;
for (int i = 1; i <= headReader.getNumberOfPages(); i++) {
strategy = parser.processContent(i,new SimpleTextExtractionStrategy());
result.append(strategy.getResultantText());
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
测试方法
StringBuffer result=readPdfContent(headReader);
System.out.println("PDF文件的文本内容如下:");
System.out.println(result);
测试结果:
原文件:
输出:
注:效果不错,但存在缺陷,部分字会解析错误,如“干”,“千”等相近的字,并且不能解析图片中的文字
若解析时报错
java.lang.NoClassDefFoundError: org/bouncycastle/asn1/ASN1Encodable
请添加两个依赖
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
<version>1.47</version>
</dependency>
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcmail-jdk15on</artifactId>
<version>1.47</version>
</dependency>
pdf转图片
/**
* @description pdf转图片
* @param filePath String 需要转换的pdf文件
* @param suff String 后缀
* @param scale float 缩放比
* @return void
*/
public static void pdfToImage(String filePath,String suff, float scale) {
try {
PdfReader headReader = new PdfReader(new FileInputStream(filePath));
int pages = headReader.getNumberOfPages();
PDFParser parser = new PDFParser(new RandomAccessBuffer(new FileInputStream(filePath)));
parser.parse();
PDDocument headDocument = parser.getPDDocument();
PDFRenderer renderer = new PDFRenderer(headDocument);
//dpi越大转换后越清晰,相对转换速度越慢
for (int i = 0; i < pages ;i++) {
File dstFile=new File(IMG_PATH+File.separator+(i+1)+"."+suff);
BufferedImage image = renderer.renderImage(i, scale);
ImageIO.write(image, suff, dstFile);
}
} catch (IOException e) {
e.printStackTrace();
}
}
测试方法:
pdfToImage("D:\\pdf\\《Java NIO (中文版)》.pdf","jpg",1.5f);
这里是通过pdfbox来实现的,所以需要添加依赖
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.7</version>
</dependency>
结果:
批量pdf合成一份pdf并生成目录(即书签)
pdf其实也有类似于word的“视图-》文档结构图”功能,在侧边栏展开结构,能够通过点击标题快速访问,这里实现生成pdf并且附带基础的“书签功能”
/**
* @description 批量pdf合成一份pdf并生成目录
* @param readerList List<PdfReader>
* @param catalogArr String[]
* @return out FileOutputStream
*/
public static FileOutputStream mergePdfsWithCatalog(List<PdfReader> readerList,String[] catalogArr){
Document document = new Document();
PdfWriter writer;
FileOutputStream out = null;
try {
out = new FileOutputStream(PDF_PATH);
writer = PdfWriter.getInstance(document, out);
document.open();
PdfContentByte cb = writer.getDirectContent();
//第几个pdf
int pdfNum=1;
//最终pdf的总页数
int totalPage=1;
for(PdfReader reader:readerList){
int currentPage=1;
while(currentPage<=reader.getNumberOfPages()){
document.newPage();
//如果是每个pdf的第一页,则建立目录
if(currentPage==1){
//LocalDestination对应跳转的页数
document.add(new Chunk(catalogArr[pdfNum-1]).setLocalDestination(totalPage+""));
PdfOutline root = cb.getRootOutline();
PdfOutline oline = new PdfOutline(root, PdfAction.gotoLocalPage(totalPage+"", false), catalogArr[pdfNum-1]);
}
//获取某一页
PdfImportedPage page = writer.getImportedPage(reader, currentPage);
cb.addTemplate(page, 0, 0);
currentPage++;
totalPage++;
}
pdfNum++;
}
document.close();
out.flush();
out.close();
} catch (DocumentException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return out;
}
测试方法:
List<PdfReader> readerList=new ArrayList<>();
//目录名字
String[] catalogArr = new String[]{"材料1","材料2","材料3","材料4","材料5","材料6","材料7","材料8","材料9","材料10"};
for(int i=0;i<10;i++){
PdfReader head = new PdfReader(new FileInputStream("D:\\pdf\\test1.pdf"));
readerList.add(head);
}
mergePdfsWithCatalog(readerList,catalogArr);
结果:
对于多级的书签这里提供一个例子,暂时无法整合成工具类
Document document = new Document();
FileOutputStream out = new FileOutputStream(PDF_PATH);
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
document.newPage();
document.add(new Chunk("Chapter 1").setLocalDestination("1"));
document.newPage();
document.add(new Chunk("Chapter 2").setLocalDestination("2"));
document.add(new Paragraph(new Chunk("Sub 2.1").setLocalDestination("2.1")));
document.add(new Paragraph(new Chunk("Sub 2.2").setLocalDestination("2.2")));
document.newPage();
document.add(new Chunk("Chapter 3").setLocalDestination("3"));
PdfContentByte cb = writer.getDirectContent();
PdfOutline root = cb.getRootOutline();
PdfOutline oline1 = new PdfOutline(root, PdfAction.gotoLocalPage("1", false), "跳转1");
PdfOutline oline2 = new PdfOutline(root, PdfAction.gotoLocalPage("2", false), "跳转2");
oline2.setOpen(false);
PdfOutline oline2_1 = new PdfOutline(oline2, PdfAction.gotoLocalPage("2.1", false), "跳转2.1");
PdfOutline oline2_2 = new PdfOutline(oline2, PdfAction.gotoLocalPage("2.2", false), "跳转2.2");
PdfOutline oline3 = new PdfOutline(root, PdfAction.gotoLocalPage("3", false), "跳转3");
document.close();
结果:
完整项目地址:
https://pan.baidu.com/s/147XbP3wVu7lURggSfVBJIw 提取码: 4igc