Hot questions for Using PDFBox in file io

Question:

I am developing a simple application which extract the text and put it into excel using PDFBox api for PDF docs & POIFSFilesystem (HSSFWorkbook) for excel files. Recently i developed an application which extract text from .doc files and put into excel at that time i never face a LOGGER problems. This time System throwing several errors [How to find specific org/slf4j/Logger jar file out of multiple bindings from the apache zip? I red Apache logging it says configure logging. i am not developing any web related functions in my application. Adding jar files are not enough? I red https://www.slf4j.org/codes.html this error handling article i never find specific error related to app.

log4j:WARN No appenders could be found for logger (org.apache.pdfbox.io.ScratchFileBuffer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

you can find below of my code which in included POI API & PDFBOX API.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FIRST {

    //private static WordExtractor we;
    //static InputStream inc;
    static PDDocument pdDoc = null;
        public static void main(String[] args) throws IOException  {
             POIFSFileSystem fs = null;

        //   String target_dir = "E:\\TESTTRS";
                //File dir = new File(target_dir);
              //  File[] files = dir.listFiles();

                String target_dir = "C:\\Users";
                File dir = new File(target_dir);
                File[] files = dir.listFiles();

                HSSFWorkbook workbook = new HSSFWorkbook();
                HSSFSheet sheet = workbook.createSheet("firstsheet");
                Row row0 = sheet.createRow(0);
                row0.createCell(0).setCellValue("S.NO");
                row0.createCell(1).setCellValue("DOCUMENT");
                row0.createCell(2).setCellValue("VALUE1");
                row0.createCell(3).setCellValue("VALUE2");
                row0.createCell(4).setCellValue("TEND");

        int j = 1;
        for ( int s=0;s<files.length;s++){
        if(files[s].isFile()){
            pdDoc = PDDocument.load(files[s]);
            //fs = new POIFSFileSystem(new FileInputStream(files[s]));
            PDFTextStripper Stripper =  new PDFTextStripper();
            String st = Stripper.getText(pdDoc);
            String linesp = System.lineSeparator();
             String[] paragraph = st.split(linesp);
            //HWPFDocument doc = new HWPFDocument(fs);
            //we = new WordExtractor(doc);
            //String[] paragraph= we.getParagraphText();

                                    Row row1 = sheet.createRow(j);
/***************************1_PRINTS S.NO *************************************/
                                        Cell cell_10 =row1.createCell(0);
                                            cell_10.setCellValue(j);
                                            j++;
/***************************2_PRINTS FILE NAMES *********************************/                                          
        Cell cell_11 = row1.createCell(1);
        cell_11.setCellValue(files[s].getName());
/******************************3_PRINTS VALUE1*****************************************/
                 Cell cell_12 = row1.createCell(2);

                 String len = files[s].getName().substring(13, 19);
                 cell_12.setCellValue(len);
 /**********************4_PRINTS VALUE2 *******************************/        
                 Cell cell_13 = row1.createCell(3);
                            for(String p: paragraph){
                                        if(p.startsWith("VALUE2"))
              cell_13.setCellValue(p.substring(22));
             }      

    /*******************5_PRINTS TEND*****************************************/      
             Cell cell_14 = row1.createCell(4);
         for(String pp: paragraph){
              if(pp.contains("TEND"))
          cell_14.setCellValue(pp);
                }

         /**************6_TEST PATTERNS*********************************************/
                Cell cell_15 = row1.createCell(5);
                    for(String c : paragraph){

                        final String regex = ("^.*([0-9]{6}\\/[A-Z][0-9]{5}).*$");
                        Pattern pattern = Pattern.compile(regex);
                        Matcher matcher = pattern.matcher(c);
                        for (int i = 1; i < matcher.groupCount(); i++) {
                        if(c.startsWith("COMMENT:"))
                            cell_15.setCellValue(""+matcher.group(i));
                    }
                }

            workbook.write(new FileOutputStream("C:\\Users\\abc.xls"));

    workbook.close();
                }
        pdDoc.close();
}}}

Answer:

I resolved it by building maven dependencies with related mvnrepository of my project into pom.xml. And configuring log4j.properties in the "src" with proper root loggers & log4j appenders. if we wanted to parse PDF docs using PDFBox API.

Question:

I have been assigned this task in my project. I am getting byte array of PDF from a service and I have to convert it into byte array of JPG image and return byte array of JPG. Can anyone help me out please?

I tried below solution that is converting PDF byte array to JPG but not returning byte array of JPG.

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.apache.pdfbox.util.PDFImageWriter;

import org.apache.pdfbox.pdmodel.PDDocument;

public class DocumentService{
    public byte[] convertPDFtoImage(byte[] bytes) {
        InputStream targetStream = new ByteArrayInputStream(bytes);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PDDocument document = null;
        try {
            document = PDDocument.load(targetStream);
            PDFImageWriter writer = new PDFImageWriter();
            writer.writeImage(document, "jpg", null, 1, 2, "C:\\Shailesh\\aaa");
        } catch (Exception e) {
            log.error(e.getMessage(), e);
            e.printStackTrace();
        }
    }
}

Answer:

I found one solution but renderer.renderImageWithDPI(pageNumber, 300) method takes page number as method argument and it can convert only one page of PDF at a time. But I need full PDf into JPG in form of byte array.

import java.awt.image.BufferedImage;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;

public class DocumentService {

    public byte[] convertPDFtoImage(byte[] bytesPDF) {
        InputStream targetStream = new ByteArrayInputStream(bytesPDF);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PDDocument document = null;
        try {
            document = PDDocument.load(targetStream);
            PDFRenderer renderer = new PDFRenderer(document);
            int pageNumber = 1;
            BufferedImage bi = renderer.renderImageWithDPI(pageNumber, 300);
            ImageIO.write(bi, "jpg", baos);
            baos.flush();
        } catch (Exception e) {
            log.error(e.getMessage(), e);
        } finally {
            if (document != null) {
                try {
                    document.close();
                    baos.close();
                    log.info("End convert PDF to Images process");
                } catch (IOException e) {
                    log.error(e.getMessage());
                }
            }
        }
        return baos.toByteArray();
    }
}