java操作pdf——扩展功能实现

正经学徒，佛系记录，不搞事情

基于上一个项目：https://blog.csdn.net/qq_31748587/article/details/84550356

新增的提供的方法：

图片转pdf
读取pdf文本
pdf转图片
批量pdf合成一份pdf并生成目录（即书签）

图片转pdf

    /**
     * @description 图片转pdf
     * @param imgList List<byte[]>
     * @return out FileOutputStream
     */
    public static FileOutputStream imgsToPdf(List<String> imgList) {
        FileOutputStream out = null;
        try {
            out = new FileOutputStream(PDF_PATH);
            Document document = new Document();
            PdfWriter.getInstance(document, out);
            document.open();
            for (String img : imgList) {
                document.newPage();
                Image image = Image.getInstance(img);
                //获取图片宽
                float imageWidth = image.getWidth();
                //获取图片高
                float imageHeight = image.getHeight();
                // 获取页面宽度
                float pageWidth = document.getPageSize().getWidth();
                // 获取页面高度
                float pageHeight = document.getPageSize().getHeight();
                //控制缩放程度，当图片超过页面的时候宽度设置为页面宽度
                if(imageWidth>pageWidth){
                    imageHeight = pageHeight*pageWidth/imageWidth;
                    imageWidth = pageWidth;
                }
                //减去74的留白
                image.scaleToFit(imageWidth-74, imageHeight);
                document.add(image);
            }
            document.close();
            out.flush();
            out.close();
        } catch (DocumentException e) {
            e.printStackTrace();
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return out;
    }

测试方法

List<String> imgList=new ArrayList<>();
imgList.add("D:\\pdf\\红彤彤.png");
imgList.add("D:\\pdf\\绿油油.png");
PDFUtil.imgsToPdf(imgList);

读取pdf文本

    /**
     * @description 读取pdf
     * @param headReader PdfReader
     * @return result
     */
    public static StringBuffer readPdfContent(PdfReader headReader) {
        StringBuffer result = new StringBuffer();
        try {
            PdfReaderContentParser parser = new PdfReaderContentParser(headReader);
            TextExtractionStrategy strategy;
            for (int i = 1; i <= headReader.getNumberOfPages(); i++) {
                strategy = parser.processContent(i,new SimpleTextExtractionStrategy());
                result.append(strategy.getResultantText());
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return result;
    }

测试方法

StringBuffer result=readPdfContent(headReader);
System.out.println("PDF文件的文本内容如下：");
System.out.println(result);

测试结果：

原文件：

输出：

注：效果不错，但存在缺陷，部分字会解析错误，如“干”，“千”等相近的字，并且不能解析图片中的文字

若解析时报错

java.lang.NoClassDefFoundError: org/bouncycastle/asn1/ASN1Encodable

请添加两个依赖

<dependency>
	<groupId>org.bouncycastle</groupId>
	<artifactId>bcprov-jdk15on</artifactId>
	<version>1.47</version>
</dependency>
<dependency>
	<groupId>org.bouncycastle</groupId>
	<artifactId>bcmail-jdk15on</artifactId>
	<version>1.47</version>
</dependency>

pdf转图片

    /**
     * @description pdf转图片
     * @param filePath String 需要转换的pdf文件
     * @param suff String 后缀
     * @param scale  float 缩放比
     * @return void
     */
	public static void pdfToImage(String filePath,String suff, float scale) {
        try {
            PdfReader headReader = new PdfReader(new FileInputStream(filePath));
            int pages = headReader.getNumberOfPages();
            PDFParser parser = new PDFParser(new RandomAccessBuffer(new FileInputStream(filePath)));
            parser.parse();
            PDDocument headDocument = parser.getPDDocument();
            PDFRenderer renderer = new PDFRenderer(headDocument);
            //dpi越大转换后越清晰，相对转换速度越慢
            for (int i = 0; i < pages ;i++) {
            	File dstFile=new File(IMG_PATH+File.separator+(i+1)+"."+suff);
                BufferedImage image = renderer.renderImage(i, scale);
                ImageIO.write(image, suff, dstFile);
            }
        } catch (IOException e) {
        	e.printStackTrace();
        }
    }

测试方法：

pdfToImage("D:\\pdf\\《Java NIO (中文版)》.pdf","jpg",1.5f);

这里是通过pdfbox来实现的，所以需要添加依赖

<dependency>
	<groupId>org.apache.pdfbox</groupId>
	<artifactId>pdfbox</artifactId>
	<version>2.0.7</version>
</dependency>

结果：

批量pdf合成一份pdf并生成目录（即书签）

pdf其实也有类似于word的“视图-》文档结构图”功能，在侧边栏展开结构，能够通过点击标题快速访问，这里实现生成pdf并且附带基础的“书签功能”

    /**
     * @description 批量pdf合成一份pdf并生成目录
     * @param readerList List<PdfReader>
     * @param catalogArr String[]
     * @return out FileOutputStream
     */
    public static FileOutputStream mergePdfsWithCatalog(List<PdfReader> readerList,String[] catalogArr){
        Document document = new Document();
        PdfWriter writer;
        FileOutputStream out = null;
        try {
            out = new FileOutputStream(PDF_PATH);
            writer = PdfWriter.getInstance(document, out);
            document.open();
            PdfContentByte cb = writer.getDirectContent();
            //第几个pdf
            int pdfNum=1;
            //最终pdf的总页数
            int totalPage=1;
            for(PdfReader reader:readerList){
                int currentPage=1;
                while(currentPage<=reader.getNumberOfPages()){
                    document.newPage();
                    //如果是每个pdf的第一页，则建立目录
                    if(currentPage==1){
                        //LocalDestination对应跳转的页数
                        document.add(new Chunk(catalogArr[pdfNum-1]).setLocalDestination(totalPage+""));
                        PdfOutline root = cb.getRootOutline();
                        PdfOutline oline = new PdfOutline(root, PdfAction.gotoLocalPage(totalPage+"", false), catalogArr[pdfNum-1]);
                    }
                    //获取某一页
                    PdfImportedPage page = writer.getImportedPage(reader, currentPage);
                    cb.addTemplate(page, 0, 0);
                    currentPage++;
                    totalPage++;
                }
                pdfNum++;
            }
            document.close();
            out.flush();
            out.close();
        } catch (DocumentException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return out;
    }

测试方法：

List<PdfReader> readerList=new ArrayList<>();
//目录名字
String[] catalogArr = new String[]{"材料1","材料2","材料3","材料4","材料5","材料6","材料7","材料8","材料9","材料10"};
for(int i=0;i<10;i++){
    PdfReader head = new PdfReader(new FileInputStream("D:\\pdf\\test1.pdf"));
    readerList.add(head);
}
mergePdfsWithCatalog(readerList,catalogArr);

结果：

对于多级的书签这里提供一个例子，暂时无法整合成工具类

Document document = new Document();
FileOutputStream out = new FileOutputStream(PDF_PATH);
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
document.newPage();
document.add(new Chunk("Chapter 1").setLocalDestination("1"));
document.newPage();
document.add(new Chunk("Chapter 2").setLocalDestination("2"));
document.add(new Paragraph(new Chunk("Sub 2.1").setLocalDestination("2.1")));
document.add(new Paragraph(new Chunk("Sub 2.2").setLocalDestination("2.2")));
document.newPage();
document.add(new Chunk("Chapter 3").setLocalDestination("3"));
PdfContentByte cb = writer.getDirectContent();
PdfOutline root = cb.getRootOutline();
PdfOutline oline1 = new PdfOutline(root, PdfAction.gotoLocalPage("1", false), "跳转1");
PdfOutline oline2 = new PdfOutline(root, PdfAction.gotoLocalPage("2", false), "跳转2");
oline2.setOpen(false);
PdfOutline oline2_1 = new PdfOutline(oline2, PdfAction.gotoLocalPage("2.1", false), "跳转2.1");
PdfOutline oline2_2 = new PdfOutline(oline2, PdfAction.gotoLocalPage("2.2", false), "跳转2.2");
PdfOutline oline3 = new PdfOutline(root, PdfAction.gotoLocalPage("3", false), "跳转3");
document.close();

结果：

完整项目地址：

https://pan.baidu.com/s/147XbP3wVu7lURggSfVBJIw 提取码: 4igc

java操作pdf——扩展功能实现

猜你喜欢