Hot questions for Using PDFBox in png

Question:

I had print an image into 'PDF' using the following code:

InputStream in = new FileInputStream(new File("C:/"+imageName));
PDJpeg img = new PDJpeg(doc, in);
contentStream.drawXObject(img, 20, pageYaxis-120, 80, 80);

Here when imagName="a.jpg" its working fine, In case of imagName="b.png" its not working. In jpg images its working but in png its not. Why it is so? Please help me. How can I make print both the formats, I mean format in depended?


Answer:

In Apache PDFBox 1.8, use PDPixelMap for PNG images:

BufferedImage awtImage = ImageIO.read(new File(image));
ximage = new PDPixelMap(doc, awtImage);

In the source code of PDFBox, see the ImageToPDF.java example. This will work with all files that can be read with ImageIO. However it is still useful to keep using PDJpeg for JPG images, because there the JPEG files are directly put into the PDF files without being converted into a lossless format.

Question:

I'm using PDFBox 2. Trying to write a PNG image file to new PDF file.

I saw there was already an answer that mention it was fixed on PDFBox2: How to add .png images to pdf using Apache PDFBox and https://issues.apache.org/jira/browse/PDFBOX-1990

This is my code:

package pdfProj;

import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

public class b {

    public static void main(String[] args) {
        PDDocument doc = null;
        doc = new PDDocument();        
        doc.addPage(new PDPage());
        try{
            BufferedImage awtImage = ImageIO.read( new File( "c://temp//line_chart.png" ) );
            PDImageXObject  pdImageXObject = LosslessFactory.createFromImage(doc, awtImage);
            PDPageContentStream contentStream = new PDPageContentStream(doc, new PDPage(), true, false);
            contentStream.drawImage(pdImageXObject, 200, 300, awtImage.getWidth() / 2, awtImage.getHeight() / 2);
                contentStream.close();
                doc.save( "c://temp//pdf//PDF_image.pdf" );
            doc.close();
        } catch (Exception io){
            System.out.println(" -- fail --" + io);
        }

    }
}

There is no exception. Just getting an empty PDF file created.


Answer:

The issue is that you add a new page to the document

doc.addPage(new PDPage());

but then create a content stream for yet another new page which you don't add to the document:

PDPageContentStream contentStream = new PDPageContentStream(doc, new PDPage(), true, false);

You should create the content stream for the page you added to the document, e.g. like this:

PDDocument doc = null;
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try{
    BufferedImage awtImage = ImageIO.read( new File( "c://temp//line_chart.png" ) );
    PDImageXObject  pdImageXObject = LosslessFactory.createFromImage(doc, awtImage);
    PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, false);
    contentStream.drawImage(pdImageXObject, 200, 300, awtImage.getWidth() / 2, awtImage.getHeight() / 2);
    contentStream.close();
    doc.save( "c://temp//pdf//PDF_image.pdf" );
    doc.close();
} catch (Exception io){
    System.out.println(" -- fail --" + io);
}

Question:

I'm trying to get a BufferedImage from PDXObjectImage that has png suffix with:

PDResources pdResources = pdPage.getResources();
Map<String, PDXObject> xobjects = (Map<String, PDXObject>) pdResources.getXObjects();
if (xobjects != null) {
    for (String key : xobjects.keySet()) {
        PDXObject xobject = xobjects.get(key);
        if (xobject instanceof PDXObjectImage) {
            PDXObjectImage imageObject = (PDXObjectImage) xobject;
            String suffix = imageObject.getSuffix();
            if (suffix != null) {
                BufferedImage image = imageObject.getRGBImage();
            }
        }
    }
}

this code works fine having jpg PDXObjectImages but image is null with png images.

What is the right way to get a BufferedImage from a PDXObjectImage that has PNG suffix?

I also tried :

BufferedImage image = ImageIO.read(((PDPixelMap)imageObject).getPDStream().createInputStream());

But again image is null.

I'm using org.apache.pdfbox version 1.8.11.


Answer:

Finally moved to version 2.0 of PDFBox then got a clear warning that I have not installed jbig2 decoder and solved the problem adding the following dependency in maven.

<dependency>
    <groupId>com.levigo.jbig2</groupId>
    <artifactId>levigo-jbig2-imageio</artifactId>
    <version>1.6.5</version>
</dependency>

@TilmanHausherr thanks.

Question:

I have a PDF that when I render it to a png it removes the horizontal and vertical lines. This is the PDF and what it should look like: https://drive.google.com/file/d/1sAXwnaoZ-QJn1Kbpw85hhzV_X5zwgfkA/view?usp=sharing

And here is the PNG of the PDF using PDFBox 2.0.13:

Why are those lines removed and how can I get them to be rendered in the PNG?


Answer:

The problem (most likely) is that you have no Java ImageIO plugin for the JBIG2 image format installed as the missing lines and headings are actually JBIG2 images.

When I run the PDFBox PDF Debugger without such a plugin and open your PDF in it, it does not display the missing parts either; having added such a plugin to its classpath, it suddenly does display them.

For more details on the PDFBox dependencies please read the PDFBox 2.0 Dependencies page. In particular

JAI Image I/O

PDF supports embedded image files, however support for some formats require third party libraries which are distributed under terms incompatible with the Apache 2.0 license:

These libraries are optional and will be loaded if present on the classpath, otherwise support for these image formats will be disabled and a warning will be logged when an unsupported image is encountered.

Maven dependencies for these components can be found in parent/pom.xml. Change the scope of the components if needed. Please make sure that any third party licenses are suitable for your project.

To include the JBIG2 library the following part can be included in your project pom.xml:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.0</version>
</dependency>

Question:

I want to convert PDF to PNG file. But for some reasons, Java 10 gives a different PNG than Java 8

private static void writeImageToPath(String sourcePath, String path, int pageWidth, int pageHeight) throws IOException 
{

    File sourceFile = new File(sourcePath);
    PDDocument document = PDDocument.load(sourceFile);
    PDFRenderer renderer = new PDFRenderer(document);
    BufferedImage buff= renderer.renderImage(0, 1, ImageType.ARGB);
    File outputfile = new File(path);
    Image image = buff.getScaledInstance(pageWidth, pageHeight, Image.SCALE_SMOOTH);
    BufferedImage bufferedImage = new BufferedImage(pageWidth, pageHeight, BufferedImage.TYPE_INT_ARGB);
    Graphics2D g2d = bufferedImage.createGraphics();
    g2d.drawImage(image, 0, 0, null);
    g2d.setColor(Color.BLACK);
    g2d.dispose();
    ImageIO.write(bufferedImage, "png", outputfile);
}

I read that the compression settings have changed in Java 9 PNG writer, so that might be why I'm seeing different results from Java 8. https://github.com/gredler/jdk9-png-writer-backport

Do you know how can I overcome this issue?

Thanks in advance!!


Answer:

tl;dr: accept it.

There are often slight differences in the rendering between jdk versions. For jdk8, the use of the Kodak CMS color management system is recommended (see Getting started) because the new LittleCMS was very slow, and Kodak CMS is no longer available in jdk10, so now Little CMS is used. The rendering result has slightly different (usually better) colors. Curve drawing may also be slightly different.

I have run pixel diff tests on PDFBox for years (to detect regressions), and I'm used to small differences. See TestPDFToImage.java in the source code... pixel difference values up to 3 are ignored.

Even with that, there are still slight differences, which make regression tests difficult. When I test PDFBox with a new java version (to see if there is anything needing attention), I do a visual inspection of the visual differences files. This takes a lot of time (these tests are done on over 1000 PDF files).

There are also visual differences between different OS, or even different computers with the same OS, because of different fonts installed.

Question:

So as the title says I am looking for a way to turn SVG to PNG with Apache Batik and then attach this image to PDF file using PDFBox without actually creating the svg and png anywhere.

Currently I have a web form that has SVG image with selectable parts of it. When the form is submitted I take the "html" part of the svg meaning I keep something like <svg bla bla> <path bla bla/></svg> in a string that Spring then uses to create a ".svg" file in a given folder, then Batik creates a PNG file in the same folder and then PDFBox attaches it to the PDF - this works fine(code below).

//Get the svg data from the Form and Create the svg file
String svg = formData.getSvg();
File svgFile = new File("image.svg");
BufferedWriter writer = new BufferedWriter(new FileWriter(svgFile));
writer.write(svg);
writer.close(); 
// Send to Batik to turn to PNG
PNGTranscoder pngTranscode = new PNGTranscoder();
File svgFile = new File("image.svg");
InputStream in = new FileInputStream(svgFile);
TranscoderInput tIn = new TranscoderInput(in);
OutputStream os = new FileOutputStream("image.png");
TranscoderOutput tOut = new TranscoderOutput(os)
pngTranscode .transcode(tIn , tOut);
os.flush();
os.close();
//Send to PDFBox to attach to pdf
File pngfile = new File("image.png");
String path = pngfile.getAbsolutePath();                    
PDImageXObject pdImage = PDImageXObject.createFromFile(path, pdf);
PDPageContentStream contents = new PDPageContentStream(pdf, pdf.getPage(1));
contents.drawImage(pdImage, 0, pdf.getPage(1).getMediaBox().getHeight() - pdImage.getHeight()); 
contents.close();

As you can see there are a lot of files and stuff (need to tidy it up a bit), but is it possible to do this on the run without the creation and constant fetching of the svg and png files?


Answer:

Given the suggestion in the comments I opted for using ByteArrayOutputStream, ByteArrayInputStream, BufferedImage and LosslessFactory. Its a bit slower than the saving (if you go through it in debug as seems the BufferedImage goes on a holiday first before creating the image). The sources I found to use are: How to convert SVG into PNG on-the-fly and Print byte[] to pdf using pdfbox

byte[] streamBytes = IOUtils.toByteArray(new ByteArrayInputStream(formData.getSvg().getBytes()));
PNGTranscoder pngTranscoder = new PNGTranscoder();
ByteArrayOutputStream os = new ByteArrayOutputStream();                  
pngTranscoder.transcode(new TranscoderInput(new ByteArrayInputStream(streamBytes)), new TranscoderOutput(os));
InputStream is = new ByteArrayInputStream(os.toByteArray());
BufferedImage bim = ImageIO.read(is);
PDImageXObject pdImage = LosslessFactory.createFromImage(pdf, bim);
PDPageContentStream contents = new PDPageContentStream(pdf, pdf.getPage(1));
contents.drawImage(pdImage, 0, pdf.getPage(1).getMediaBox().getHeight() - pdImage.getHeight()); 
contents.close();

Question:

I have been trying to convert PDF to png file with transparency without success. I tried to solve it with many ways but I didn't succeed. I'm writing my ways hoping someone will find where it went wrong:

1.

try (final PDDocument document = PDDocument.load(new File(srcpath))){
                PDFRenderer pdfRenderer = new PDFRenderer(document);
                for (int page = 0; page < document.getNumberOfPages(); ++page)
                {
                    BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
                    String fileName = imageConverted;

                    boolean hasAlpha = bim.getColorModel().hasAlpha();
                System.out.println(hasAlpha);

                    ImageIOUtil.writeImage(bim, fileName, 300);
                }
                document.close();
            } catch (IOException e){
                System.err.println("Exception while trying to create pdf document - " + e);
            }
  1. RandomAccessFile raf; try { raf = new RandomAccessFile(file, "r");

            FileChannel channel = raf.getChannel();
            ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            PDFFile pdffile = new PDFFile(buf);
            // draw the first page to an image
            int num=pdffile.getNumPages();
            for(int i=0;i<num;i++)
            {
                PDFPage page = pdffile.getPage(i);
    
                //get the width and height for the doc at the default zoom              
                int width=(int)page.getBBox().getWidth();
                int height=(int)page.getBBox().getHeight();             
    
                Rectangle rect = new Rectangle(0,0,width,height);
                int rotation=page.getRotation();
                Rectangle rect1=rect;
                if(rotation==90 || rotation==270)
                    rect1=new Rectangle(0,0,rect.height,rect.width);
    
                //generate the image
                BufferedImage img = (BufferedImage)page.getImage(
                            rect.width, rect.height, //width & height
                            rect1, // clip rect
                            null, // null for the ImageObserver
                            true, // fill background with white
                            true  // block until drawing is done
                    );
                 Graphics2D graphics = (Graphics2D)img.getGraphics();
                 graphics.setBackground( new Color( 255, 255, 255, 0 ) );
    
                ImageIO.write(img, "png", new File(imageConverted));
            }
        } 
        catch (FileNotFoundException e1) {
            System.err.println(e1.getLocalizedMessage());
        } catch (IOException e) {
            System.err.println(e.getLocalizedMessage());
        }
    

3.

// Instantiating the PDFRenderer class
        PDFRenderer renderer = new PDFRenderer(document);

        // Rendering an image from the PDF document
        BufferedImage image = null;
        try {
             image= renderer.renderImage(0);
        } catch (IOException e1) {
            return "N/A";
        }

        // Writing the image to a file
        try {
            ImageIO.write(image, "png", new File(imageConverted));
        } catch (IOException e) {
            return "N/A";
        }

But I get the png with white background... Any idea? Thanks in advance!!!


Answer:

change this line

BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);

to

BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGBA);

this will get you a transparent image.

Question:

I am using PDF box to create PDF, I am trying to print image into pdf , it works all the format except PNG. Mycode as follows :

String image = "c:/image.png";                  
                                   PDXObjectImage ximage = null;
                  if( image.toLowerCase().endsWith( ".jpg" ) )
                  {
                      ximage = new PDJpeg(doc, new FileInputStream( image ) );
                  }
                  else if (image.toLowerCase().endsWith(".tif") || image.toLowerCase().endsWith(".tiff"))
                 {
                      ximage = new PDCcitt(doc, new RandomAccessFile(new File(image),"r"));
                 }
                 else
                  {
                      BufferedImage awtImage = ImageIO.read( new File( image ) );
                      ximage = new PDPixelMap(doc, awtImage);
                      throw new IOException( "Image type not supported:" + image );
                  }

                 PDPageContentStream contentStream = new PDPageContentStream(doc, page);
                  contentStream.drawImage( ximage, 20, 20 );

When ever I give png image it is going to the :

else
                      {
                          BufferedImage awtImage = ImageIO.read( new File( image ) );
                          ximage = new PDPixelMap(doc, awtImage);
                          //throw new IOException( "Image type not supported:" + image );
                      }

and showing image stream IO exception cant read image file. What changes am I need to make to accept png image also in this? Please help...


Answer:

String image = "c:/"+rst.getString(8);                  
                                               PDXObjectImage ximage = null;
                                if( image.toLowerCase().endsWith( ".jpg" ) )
                                {
                                    ximage = new PDJpeg(doc, new FileInputStream( image ) );
                                }
                                else if (image.toLowerCase().endsWith(".tif") || image.toLowerCase().endsWith(".tiff"))
                               {
                                    ximage = new PDCcitt(doc, new RandomAccessFile(new File(image),"r"));
                               }
                               else
                                {
                                    BufferedImage awtImage = ImageIO.read( new File( image ) );
                                    ximage = new PDPixelMap(doc, awtImage);

                                }


                               contentStream.drawXObject(ximage, 20, pageYaxis-120, 80, 80);
                                                pageYaxis = pageYaxis-56;

Question:

I have imported pdfbox-2.0.4.jar, fontbox-2.0.4.jar and commons-logging-1.1.1.jar into eclipse kepler. The programm runs on win10. The console prints lots of such warnings

org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Using fallback font ArialUnicodeMS for CID-keyed TrueType font KaiTi_GB2312.

And I cannot access the image file with whole content. How can I fix it? My code is like this:

public class PdfboxTest {
    private static final String filePath = "xxx";
    private static final String outputFilePath = "xxx";

    public static void change(File inputFile, File outputFolder) throws IOException {

        String totalFileName = inputFile.getName();
        String fileName = totalFileName.substring(0,totalFileName.lastIndexOf("."));
        PDDocument doc = null;
        try {
            doc = PDDocument.load(inputFile);
            PDFRenderer pdfRenderer = new PDFRenderer(doc);
            int pageCounter = 0;
            for(PDPage page : doc.getPages())
            {
                BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
                ImageIOUtil.writeImage(bim, outputFilePath + "\\" + fileName + (pageCounter++) +".png", 300);
            }
            doc.close();

        } finally {
            if (doc != null) {
                doc.close();
            }
        }
    }
    public static void main(String[] args) {
        File inputFile = new File(filePath);
        File outputFolder = new File(outputFilePath);
        if(!outputFolder.exists()){
            outputFolder.mkdirs();
        }
        try {
            change(inputFile, outputFolder);
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

Answer:

As seen in the comments - the best solution is to install the missing font KaiTi_GB2312. The message Using fallback font means that the PDF references the mentioned font and didn't embed it, but can't find it on your computer, so PDFBox tried a fallback solution, in this case the ArialUnicodeMS font. Sadly such fallback solutions are not always perfect, which is why some glyphs were missing in the rendered image.

Question:

I have imported pdfbox-2.0.4.jar, fontbox-2.0.4.jar and commons-logging-1.1.1.jar into eclipse kepler. The programm runs on win10. The console prints lots of such warnings

org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
WARNING: Using fallback font ArialUnicodeMS for CID-keyed TrueType font KaiTi_GB2312.

And I cannot access the image file with whole content. How can I fix it? My code is like this:

public class PdfboxTest {
    private static final String filePath = "xxx";
    private static final String outputFilePath = "xxx";

    public static void change(File inputFile, File outputFolder) throws IOException {

        String totalFileName = inputFile.getName();
        String fileName = totalFileName.substring(0,totalFileName.lastIndexOf("."));
        PDDocument doc = null;
        try {
            doc = PDDocument.load(inputFile);
            PDFRenderer pdfRenderer = new PDFRenderer(doc);
            int pageCounter = 0;
            for(PDPage page : doc.getPages())
            {
                BufferedImage bim = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
                ImageIOUtil.writeImage(bim, outputFilePath + "\\" + fileName + (pageCounter++) +".png", 300);
            }
            doc.close();

        } finally {
            if (doc != null) {
                doc.close();
            }
        }
    }
    public static void main(String[] args) {
        File inputFile = new File(filePath);
        File outputFolder = new File(outputFilePath);
        if(!outputFolder.exists()){
            outputFolder.mkdirs();
        }
        try {
            change(inputFile, outputFolder);
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
}

Answer:

As seen in the comments - the best solution is to install the missing font KaiTi_GB2312. The message Using fallback font means that the PDF references the mentioned font and didn't embed it, but can't find it on your computer, so PDFBox tried a fallback solution, in this case the ArialUnicodeMS font. Sadly such fallback solutions are not always perfect, which is why some glyphs were missing in the rendered image.

Question:

I'm using PDFBox and for some reason when getting the number of pages on a document that was generated with Aspose (aspose.pdf-17.4) PDFBox (2.0.4) returns 0 pages, does anybody know a workaround for it? I tried loading the document and resaving with PDFBox but it didn't seem to work :S

The code is very simple:

PDDocument doc = new PDDocument();
doc.load(new File (file_path));
int p = dac.getNumberOfPages();
doc.close();

Any help is greatly appreciated!


Answer:

I was able to get the correct number of pages after updating with aspose.pdf-17.8 and PDFBox 2.0.8, thanks to Tilman Hausherr for pointing the version.