Hot questions for Using PDFBox in bufferedimage

Top Java Programmings / PDFBox / bufferedimage

Question:

I'm trying to convert a PDF doc to .png files using PDFBox. I followed this answer to get an idea of what dependencies were needed and to give me a starting point. When I try to loop through the pages and create the BufferedImage I receive a NoClassDefFoundError error. It's looking for org/apache/fontbox/FontBoxFont but through some extensive Googling I've not found a thing about FontBoxFont. Is this a separate jar that needs to be included? What is causing this error? The following .jar's are included in the project:

pdfbox-2.0.2.jar
levigo-jbig2-imageio-1.6.5.jar
pdfbox-tools-2.0.2.jar
jai-imageio-core-1.3.1.jar
commons-logging-1.2.jar

Here is the main method:

public static void main(String[] args) {

    String sourceDir = "C:/Dev/Workspace/PdfToPng/Stocks.pdf";
    String destinationDir = "C:/Dev/Workspace/PdfToPng/pages/";

    try {
        PDDocument document = PDDocument.load(new File(sourceDir));
        PDFRenderer pdfRenderer = new PDFRenderer(document);
        for(int page = 0; page < document.getNumberOfPages(); ++page) {
            BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
            ImageIOUtil.writeImage(bim, destinationDir + (page+1) + ".png", 300);
        }
        document.close();
    } catch(Exception e) {
        System.out.println(e.getStackTrace());
    }
}

The error is thrown on BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/fontbox/FontBoxFont
    at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
    at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:123)
    at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:60)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:815)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:472)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446)
    at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
    at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:189)
    at org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:208)
    at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:139)
    at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:94)
    at PdfToPng.main(PdfToPng.java:25)
Caused by: java.lang.ClassNotFoundException: org.apache.fontbox.FontBoxFont
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 12 more

Answer:

Obviously you need to add Apache FontBox 2.0.2 to your classpath, you can get it from here

Question:

I'm writing a simple scanning application using jfreesane and Apache PDFBox.

Here is the scanning code:

InetAddress address = InetAddress.getByName("192.168.0.17");
SaneSession session = SaneSession.withRemoteSane(address);
List<SaneDevice> devices = session.listDevices();
SaneDevice device = devices.get(0);
device.open();
device.getOption("resolution").setIntegerValue(300);

BufferedImage bimg = device.acquireImage();
File file = new File("test_scan.png");
ImageIO.write(bimg, "png", file);

device.close();

And making PDF:

PDDocument document = new PDDocument();
float width = bimg.getWidth();
float height = bimg.getHeight();
PDPage page = new PDPage(new PDRectangle(width, height));
document.addPage(page);
PDImageXObject pdimg = LosslessFactory.createFromImage(document, bimg);
PDPageContentStream stream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, true);
stream.drawImage(pdimg, 0, 0);
stream.close();

document.save(filename);
document.close();

And here is the result:

As you can see the PDF image is more "pale" (saturation? - sorry, I'm not good at color theory and don't know how to name it correctly).

What I have found out:

  1. Printing BufferedImage to JLabel using JLabel(new ImageIcon(bimg)) constructor produces the same result as with PDF ("pale" colors) so I guess PDFBox is not the reason.
  2. Changing scanning resolution - no effect.
  3. bimg.getTransparency() returns 1 (OPAQUE)
  4. bimg.getType() returns 0 (TYPE_CUSTOM)

PNG file:

http://s000.tinyupload.com/index.php?file_id=95648202713651192395

PDF file

http://s000.tinyupload.com/index.php?file_id=90369236997064329368


Answer:

There was an issue in JFreeSane with colorspaces, it was fixed in version 0.97:

https://github.com/sjamesr/jfreesane/releases/tag/jfreesane-0.97

Question:

I am using zxing to scan barcode and split that in pdf. But most barcodes are scanned and few aren't. Although all barcodes are properly visible and i can scan them using barcode android app. my code is

    Boolean flag = Boolean.FALSE;
    PDDocument pdfDoc = null;
    Result prevResult = null;
    try{
        pdfDoc = PDDocument.load(new File(pathToReadPdf));
        log.debug("Total pdf pages : "+pdfDoc.getNumberOfPages());

        Reader reader = new MultiFormatReader();
        List<PDPage> pages = pdfDoc.getDocumentCatalog().getAllPages();
        for(PDPage page : pages) {
            PDResources resources = page.getResources();
            // Identify images from pdf
            Map images = resources.getImages();
            if( images != null ){
                Iterator imageIter = images.keySet().iterator();
                while( imageIter.hasNext()){
                    String key = (String)imageIter.next();
                    PDXObjectImage image = (PDXObjectImage)images.get( key );
                    if (image.getRGBImage()!=null){
                    Hashtable<DecodeHintType, Object> decodeHints = new Hashtable<DecodeHintType, Object>(3);
                    Vector<BarcodeFormat> barcodeFormats = new Vector<BarcodeFormat>();
                    barcodeFormats.add(BarcodeFormat.CODE_128);
                    decodeHints.put(DecodeHintType.POSSIBLE_FORMATS, barcodeFormats);
                    decodeHints.put(DecodeHintType.TRY_HARDER, Boolean.TRUE);
                    decodeHints.put(DecodeHintType.PURE_BARCODE, true);
                    //decodeHints.put(DecodeHintType.CHARACTER_SET, "ISO-8859-1");
                    LuminanceSource source = new BufferedImageLuminanceSource(image.getRGBImage(), 0, 0, image.getWidth(), image.getHeight());
                    BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
                    Result result = null;

                    try{
                        result = reader.decode(bitmap, decodeHints);
                        splitPdf(page, result, loanApplicationId);
                        prevResult= result;
                        flag = Boolean.TRUE;
                    }catch(NotFoundException nfe){
                        if(prevResult!=null){
                            mergePDF(page, prevResult, loanApplicationId);
                        }
                        continue;
                    }
                    log.debug("Barcode text is " + result.getText());
                    }
                }
            }
        }
    }catch(Exception e){
        e.printStackTrace();
        log.error("Error while splitting PDF : " + e.getMessage(), e);
    }
    finally {
        try{
            if(pdfDoc != null){
                pdfDoc.close();
            }
        }catch (IOException ioe){
            ioe.printStackTrace();
            log.error("Error while closing PDF : " + ioe.getMessage(), ioe);
        }
    }
    return flag;` 

I think error might be in bitmap conversion. I am getting error at com.google.zxing.NotFoundException at result = reader.decode(bitmap, decodeHints);

barcode creation logic:

 public byte[] createBarCode128(String fileName) {

    byte[] imageInByte = new byte[1024];
    try {
        Code128Bean bean = new Code128Bean();
        final int dpi = 300;

        //Configure the barcode generator
        bean.setModuleWidth(UnitConv.in2mm(6.0f / dpi));
        bean.doQuietZone(false);

        BitmapCanvasProvider canvas = new BitmapCanvasProvider(null, "image/x-png", dpi, BufferedImage.TYPE_BYTE_BINARY, false, 0);

        //Generate the barcode
        bean.generateBarcode(canvas, fileName);

        //Signal end of generation
        canvas.finish();

        BufferedImage originalImage = canvas.getBufferedImage();
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        ImageIO.write(originalImage, "jpg", baos);
        baos.flush();
        imageInByte = baos.toByteArray();
        log.debug(imageInByte.toString());
        baos.close();
        log.debug(" Bar Code is generated successfully ");
    }
    catch (IOException ex) {
        ex.printStackTrace();
        log.error(ex.getMessage(),ex);
    }
    return imageInByte;
}

I am using below dependencies:

 <dependency>
        <groupId>com.google.zxing</groupId>
        <artifactId>core</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>fontbox</artifactId>
        <version>1.8.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>1.8.0</version>
    </dependency>
    <dependency>
        <groupId>com.google.zxing</groupId>
        <artifactId>javase</artifactId>
        <version>2.2</version>
    </dependency>

my java version is 6 so zxing version 3 is not supported.

Please suggest any solution.


Answer:

Having looked at the sample file provided by the OP I can see no real differences between the barcodes zxing can scan and those it cannot scan. They all seem to be scanned at 300 dpi and embedded in the same fashion.

Zooming into the image one can see, though, that the scanning quality is quite poor for the purpose of barcode recognition:

The scanned bar code outlines are not sharply cut and have some saw tooth pattern. This makes the bars appear to be of differing width on differing scan lines.

I assume you simply are lucky with the codes you can recognize.

I'd propose changing the scan properties, maybe b&w instead of grayscale, maybe a different resolution...

Question:

I try to draw an image from a bufferedImage into a PDF using PDFBox but fails, and I get black images and Acrobat Reader warns whith errors like "Out of memory" (but PDF is display).

I use a bufferedImage because I need to draw a JavaFX Image object (with came from call to Funciones.crearImagenDesdeTexto(), is a function which converts a text into an Image) into PDF. Rest of images works well without using bufferedimage.

    PDPixelMap img = null;
    BufferedImage bi;

    try {
        //If item has id, I try to get image with that id (image it's shows OK on PDF)
        img = new PDPixelMap(documento, read(getClass().getResourceAsStream("/com/img/" + item.getId() + ".png")));
    }
    catch (Exception e) {
        //If item has not id or fails load image, I create image on the fly (which contains item name. This not work on PDF, shows black images)
        bi = new BufferedImage(alto, ancho, BufferedImage.TYPE_INT_ARGB);
        bi.createGraphics().drawImage(SwingFXUtils.fromFXImage(Funciones.crearImagenDesdeTexto(item.getNombre()), null), ancho, alto, null);
        img = new PDPixelMap(documento, bi);
    }
    finally {
        contenedor.drawXObject(img, x, y, alto, ancho);
    }

NOTE: crearImagenDesdeTexto() returns a JavaFX Image Object that is create on the fly (I try this function in other parts of the program and works well, function is take from other stackOverflow response).


Answer:

Your code is confusing, you have three "new PDJpeg" and one of them is in a catch (which should just handle the error). And what does "read()" do? Does it pass a stream or a BufferedImage? If it is a stream, then it is wrong, because PDJpeg is for JPEGs, not for PNG.

The second one

img = new PDJpeg(documento, (getClass().getResourceAsStream("/com/img/" + Byte.toString(item.getId()) + ".png")));

is definitively wrong for the same reason: PDJPeg is not for PNG files / streams.

If you want to create an image from a PNG file / stream, use PDPixelMap.

It is possible to create a PDJpeg object from a BufferedImage, but this is recommended only if the image wasn't encoded before. Because if you would read a BufferedImage from a JPEG, and then use PDJPeg for this, you'll have a slight loss of quality as the image is decoded and encoded again (JPEG is a "lossy" compression format).

If my advice doesn't help, please upload the JPEG file and the PDF somewhere.

Also make sure that you're using the latest version, which is 1.8.7.

Update after comments: the parameters to createGraphics.drawImage() should be 0, 0 and not width, height. The two parameters are a location, not a size.