Hot questions for Using PDFBox in pdf form

Question:

I fill (programatically) a form (AcroPdf) in a PDF document and sign the document afterwards. I start with doc.pdf, create doc_filled.pdf, using the setFields.java example of PDFBox. Then I sign doc_filled.pdf, creating doc?filled_signed.pdf, using some code, based on the signature examples and open the pdf in the Acrobat Reader. The entered Field data is visible and the signature panel tells me

"There are errors in the formatting or information contained in this signature (The signature byte array is invalid)"

So far, I know that:

  • the signature code applied alone (i.e. directly creating some doc_signed.pdf) creates a valid signature
  • the problem exists for "invisible signatures", visible signatures and visible signatures, being added to existing signature fields.
  • the problem even occurs, if I do not fill the form, but only open it and save it, i.e.:

    PDDocument doc = PDDocument.load(new File("doc.pdf"));
    doc.save(new File("doc_filled.pdf"));
    doc.close();
    

suffices to break the afterwards applied signing code.

On the other hand, if I take the same doc.pdf, enter the field's values manually in Adobe, the signing code produces valid signatures.

What am I doing wrong?

Update:

@mkl asked me to provide the files, i am talking about (I do not have enough reputation currently, to post all files as links, sorry for that inconvenience):

the last one was created, by signing and filling the document in one go, using

    doc.saveIncremental(); 

As I already wrote in the comment, some

    setNeedToBeUpdate(true);

seems to be missing, though. With reference to @mkl 's second comment, I found this SO question: Saved Text Field value is not displayed properly in PDF generated using PDFBOX, which also covers to some entered text not being show. I gave it a first try, applying

    setBoolean(COSName.getPDFName("NeedAppearances"), true); 

to the field's and form's dictionary, which then shows the fields context, but the signature does not get added in the end. Still I have to look further into that.

Update: The story continues here: PDFBox 1.8.10: Fill and Sign Document, Filling again fails


Answer:

The cause of the OP's original problem, i.e. that after loading his PDF (for form fill-in) with PDFBox and then saving it, this new PDF cannot be successfully signed using PDFBox signing code, has already been explained in detail in this answer, in short:

  • When saving documents regularly, PDFBox does so using a cross reference table.

    • If the document to save regularly had been loaded from a PDF with a cross reference stream, all entries of the cross reference stream dictionary are saved in the trailer dictionary.
  • When saving documents in the process of applying a signature, PDFBox creates an incremental update; as such incremental updates require that the update uses the same kind of cross reference as the original revision, PDFBox in this case tries to use the same technique.

    • For recognizing the technique originally used PDFBox looks at the Type entry of the dictionary in its document representation into which trailer or cross reference stream dictionary had been loaded: If there is a Type entry with value XRef (which is so specified for cross reference streams), a stream is assumed, otherwise a table.

Thus, in the case of the OP's original PDF doc.pdf which has a cross reference stream:

  • After loading and form fill-in the document is saved regularly, i.e. using a cross reference table, but all the former cross reference stream entries, among them the Type, are copied to the trailer. (doc_filled.pdf)

  • After loading this saved PDF with a cross reference table for signing, it is saved again using an incremental update. PDFBox assumes (due to the Type trailer entry) that the existing file has a cross reference stream and, therefore, uses a cross reference stream at the end of the incremental update, too. (doc_filled_signed.pdf)

  • Thus, in the end the filled-in, then signed PDF has two revisions, the inner one with a cross reference table, the outer one with a cross reference stream.

  • As this is not valid, Adobe Reader upon loading the PDF, repairs this in its internal document representation. Repairing changes the document bytes. Thus, the signature in Adobe Reader's eyes is broken.

  • Most other signature validators don't attempt such repairs but check the signature of the document as is. They validate the signature successfully.

The answer referenced above also offers some ways around this:

  • A: After loading the PDF for form fill-in, remove the Type entry from the trailer before saving regularly. If signing is applied to this file, PDFBox will assume a cross reference table (because the misleading Type entry is not there. Thus, the signature incremental update will be valid.

  • B: Use an incremental update for saving the form fill-in changes, too, either in a separate run or in the same run as signing. This also results in a valid incremental update.

Generally I would propose the latter option because the former option likely will break if the PDFBox saving routines ever are made compatible with each other.

Unfortunately, though, the latter option requires marking the added and changed objects as updated, including a path from the document catalog. If this is not possible or at least too cumbersome, the first option might be preferable.


In the case at hand the OP tried the latter option (doc_filled_and_signed.pdf):

At the Moment the text box's content is only visible, when the text box is selected (with Acrobat reader and Preview the same behaviour). I flag the PDField, all of its parents, the AcroForm, the Catalog as well as the page where it is displayed.

He marked the changed field as updated but not the associated appearance stream which automatically is generated by PDFBox when setting the form field value.

Thus, in the result PDF file the field has the new value but the old, empty appearance stream. Only when clicking into the field, Adobe Reader creates a new appearance based on the value for editing.

Thus, the OP also has to mark the new normal appearance stream (the form field dictionary contains an entry AP referencing a dictionary in which N references the normal appearance stream). Alternatively (if finding the changed or added entries becomes too cumbersome) he might try the other option.

Question:

I am using Apache PDFBox to read a fillable PDF form and fill the fields based on some data. I am using the below code (as per suggestions from other SO answers) to get the default Appearance String and changing it (as you can see below, I am changing the font size from 10 to 12 if the field name is "Field1".

  1. How do I bold the field? Any documentation on what order the /Helv 10 Tf 0 g are arranged? What I need to set to bold the field?
  2. If I understand right, there are 14 basic fonts that I can use in PDFBox out of the box (pun unintended). I would like to use one or more fonts that look like Signatures (cursive). Any out of the box fonts that do that? If not, if I have my own font, how do I set in the method to be written to the PDF?

Please note, the below code works fine by filling the specific 'value' passed in the method parameter in the specific 'name' field of the method parameter.

Thank you !

public static void setField(String name, String value ) throws     IOException {
    PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
    PDAcroForm acroForm = docCatalog.getAcroForm();
    PDField field = acroForm.getField( name );

    COSDictionary dict = ((PDField)field).getDictionary();
    COSString defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
    if (defaultAppearance != null)
    {
        dict.setString(COSName.DA, "/Helv 10 Tf 0 g");
        if(name.equalsIgnoreCase("Field1"))
        {
            dict.setString(COSName.DA, "/Helv 12 Tf 0 g");
        }
    }
    if(field instanceof PDTextbox)
    {
        field= new PDTextbox(acroForm, dict);
        ((PDField)field).setValue(value);
    }

As per mkl's answer, to use two fonts in the same PDF, I used the following method: I could not get the default font and a custom font working together, so I added two fonts to the resources and used them.

public List<String> prepareFont(PDDocument _pdfDocument) throws IOException
{
    PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
    PDAcroForm acroForm = docCatalog.getAcroForm();

    PDResources res = acroForm.getDefaultResources();
    if (res == null)
        res = new PDResources();

    InputStream fontStream = getClass().getResourceAsStream("LiberationSans-Regular.ttf");
InputStream fontStream2 = getClass().getResourceAsStream("Font2.ttf");
    PDTrueTypeFont font = PDTrueTypeFont.loadTTF(_pdfDocument, fontStream);
PDTrueTypeFont font2 = PDTrueTypeFont.loadTTF(_pdfDocument, fontStream2);
    String fontName = res.addFont(font); 
String fontName2 = res.addFont(font2);
    acroForm.setDefaultResources(res);
    List<String> fontList = new ArrayList<String>();    fontList.add(font1);fontList.add(font2);
    return fontList;
}

Answer:

(You can find a runnable example here: FillFormCustomFont.java)

Using poor-man's-bold
  1. How do I bold the field? ... What I need to set to bold the field?

In PDF you usually make text bold by using a font with bold glyphs, also see your second question. If you don't have such a bold font at hands, you may instead use some poor-man's-bold technique, e.g. not only filling the letter but also stroking a line along its borders:

public static void setFieldBold(String name, String value) throws IOException
{
    PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
    PDAcroForm acroForm = docCatalog.getAcroForm();
    PDField field = acroForm.getField(name);

    COSDictionary dict = ((PDField) field).getDictionary();
    COSString defaultAppearance = (COSString) dict
            .getDictionaryObject(COSName.DA);
    if (defaultAppearance != null)
    {
        dict.setString(COSName.DA, "/Helv 10 Tf 2 Tr .5 w 0 g");
        if (name.equalsIgnoreCase("Field1")) {
            dict.setString(COSName.DA, "/Helv 12 Tf 0 g");
        }
    }
    if (field instanceof PDTextbox)
    {
        field = new PDTextbox(acroForm, dict);
        ((PDField) field).setValue(value);
    }
}

(2 Tr .5 w = use rendering mode 2, i.e. fill and stroke, and use a line width of .5)

Instead of

you now get

Using custom fonts
  1. If I understand right, there are 14 basic fonts that I can use in PDFBox out of the box (pun unintended). I would like to use one or more fonts that look like Signatures (cursive). Any out of the box fonts that do that? If not, if I have my own font, how do I set in the method to be written to the PDF?

If you want to use an own font, you first need to register it in the AcroForm default resources like this:

public String prepareFont(PDDocument _pdfDocument) throws IOException
{
    PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
    PDAcroForm acroForm = docCatalog.getAcroForm();

    PDResources res = acroForm.getDefaultResources();
    if (res == null)
        res = new PDResources();

    InputStream fontStream = getClass().getResourceAsStream("LiberationSans-Regular.ttf");
    PDTrueTypeFont font = PDTrueTypeFont.loadTTF(_pdfDocument, fontStream);
    String fontName = res.addFont(font);
    acroForm.setDefaultResources(res);

    return fontName;
}

This method returns the font name to use in

public static void setField(String name, String value, String fontName) throws IOException
{
    PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
    PDAcroForm acroForm = docCatalog.getAcroForm();
    PDField field = acroForm.getField(name);

    COSDictionary dict = ((PDField) field).getDictionary();
    COSString defaultAppearance = (COSString) dict
            .getDictionaryObject(COSName.DA);
    if (defaultAppearance != null)
    {
        dict.setString(COSName.DA, "/" + fontName + " 10 Tf 0 g");
        if (name.equalsIgnoreCase("Field1")) {
            dict.setString(COSName.DA, "/" + fontName + " 12 Tf 0 g");
        }
    }
    if (field instanceof PDTextbox)
    {
        field = new PDTextbox(acroForm, dict);
        ((PDField) field).setValue(value);
    }
}

You now get

The difference is not too big because the fonts are quite similar. Use the font of your choice for more effect.

Using /Helv, /HeBo, ...

The OP found a list of font names /Helv, /HeBo, ..., probably in the PDFBox issue PDFBOX-1234, which appear to be usable without defining them in any resource dictionary.

These names are not a PDF feature, i.e. the PDF specification does not know about them, on the contrary:

The default appearance string (DA) contains any graphics state or text state operators needed to establish the graphics state parameters, such as text size and colour, for displaying the field’s variable text. Only operators that are allowed within text objects shall occur in this string (see Figure 9). At a minimum, the string shall include a Tf (text font) operator along with its two operands, font and size. The specified font value shall match a resource name in the Font entry of the default resource dictionary (referenced from the DR entry of the interactive form dictionary; see Table 218).

(section 12.7.3.3 Field Dictionaries / Variable Text in ISO 32000-1)

Thus, the specification does not know those default font names.

Nonetheless, Adobe Reader/Acrobat seem to support them, most likely because at some time in the distant past some form generating tool assumed them to be there and support for those forms was kept due to compatibility reasons.

Using this feature, therefore, might not be the best choice but your mileage may vary.

Using custom and standard fonts

In his comments the OP indicated he wanted to use both custom and standard fonts in forms.

To do this I generalized the method prepareFont a bit and refactored the TTF import into a separate method:

public List<String> prepareFont(PDDocument _pdfDocument, List<PDFont> fonts) throws IOException
{
    PDDocumentCatalog docCatalog = _pdfDocument.getDocumentCatalog();
    PDAcroForm acroForm = docCatalog.getAcroForm();

    PDResources res = acroForm.getDefaultResources();
    if (res == null)
        res = new PDResources();

    List<String> fontNames = new ArrayList<String>();
    for (PDFont font: fonts)
    {
        fontNames.add(res.addFont(font));
    }

    acroForm.setDefaultResources(res);

    return fontNames;
}

public PDFont loadTrueTypeFont(PDDocument _pdfDocument, String resourceName) throws IOException
{
    try ( InputStream fontStream = getClass().getResourceAsStream(resourceName); )
    {
        return PDTrueTypeFont.loadTTF(_pdfDocument, fontStream);
    }
}

Using these methods you can mix custom and standard fonts like this:

PDDocument doc = PDDocument.load(originalStream);
List<String> fontNames = prepareFont(doc, Arrays.asList(loadTrueTypeFont(doc, "LiberationSans-Regular.ttf"), PDType1Font.HELVETICA_BOLD));

setField(doc, "FirstName", "My first name", fontNames.get(0));
setField(doc, "LastName", "My last name", fontNames.get(1));

doc.save(new File(RESULT_FOLDER, "acroform-setFieldCustomStandard.pdf"));
doc.close();

(FillFormCustomFont.testSetFieldCustomStandard_acroform)

Resulting in

PDType1Font has constants for all 14 standard fonts. Thus, like this you can use standard fonts (mixed with custom fonts if desired) in form fields in a way that generates the proper Font entries in the default resources, i.e. without relying on proprietary default font names like HeBo.

PS

Any documentation on what order the /Helv 10 Tf 0 g are arranged?

Yes, there is, cf. the specification ISO 32000-1.

Question:

I fill in a PDF form with PDFBox which I flatten before saving it. The form has a custom font for text and also form fields. When I open the output document (with flattened fields) on a device which does not have this custom font installed, the font of normal text is still correct, but the font of the flattened fields is displayed with a fallback (?) font. On a device which does have installed this custom font, everything looks as expected.

Is there a way to force using the same custom font for all text after flattening the form?

Code (simplified) used for filling in the PDF form with PDFBox:

public class App
{
    public static void main(String[] args) throws IOException {
        String formTemplate = "src/main/resources/fonts.pdf";
        String filledForm = "src/main/resources/fonts_out.pdf";
        PDDocument pdfDocument = PDDocument.load(new File(formTemplate));
        PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
        acroForm.getField("text").setValue("Same font in form text field (updated with PDFBox)");
        acroForm.setNeedAppearances(true);
        acroForm.refreshAppearances();
        acroForm.flatten();
        pdfDocument.save(filledForm);
        pdfDocument.close();
    }
}

PDFs: Input Output

Expected:

Result when font is not installed on system:


Answer:

Some observations for your PDF (the afore mentioned encoding problems are none - just plain ignorance on my behalf):

  1. The SansDroid font is not embedded into the PDF. This is fixed by replacing the F2 font with the newly embedded F5 font.

  2. The NeedAppearances flag is set meaning that there is no appearance for the form fields. Any reader must (re)create those. This is not done automatically by PDFBox before flattening so I added this part

  3. To not cause any more warnings about missing fonts I removed the F2 font completely.

  4. I run the original PDF through preflight and it gave me the following warning: "The required key /Subtype is missing. Path: ->Pages->Kids->[0]->Annots->[0]->AP->N " The key does exists however it seems to inicate that there is an error with the appearance of the form field. If I remove the /N dict the error is gone. The stream is "/Tx BMC EMC" - maybe there is some EOL missing? But since the appearance is regenerated anyway the error is gone afterwards.

With the following code the DroidSans font is embedded into the PDF:

File pdf = new File("Fonts.pdf");
final PDDocument document = PDDocument.load(pdf);

FileInputStream fontFile = new FileInputStream(new File("DroidSans.ttf"));
PDFont font = PDType0Font.load(document, fontFile, false);

//1. embedd and register the font (Catalog dict)
PDAcroForm pDAcroForm = document.getDocumentCatalog().getAcroForm();
//create a new font resource
PDResources res = pDAcroForm.getDefaultResources();
if (res == null) res = new PDResources();
COSName fontName = res.add(font);
pDAcroForm.setDefaultResources(res);

//2. Now change the font of form field to the newly added font
PDField field = pDAcroForm.getField("text");
//field.setValue("Same font in form text field (updated with PDFBox)");

COSDictionary dict = field.getCOSObject();
COSString defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);

if (defaultAppearance != null){
    String currentValue = dict.getString(COSName.DA);
    //replace the font - this should be improved with a more general version
    dict.setString(COSName.DA,currentValue.replace("F2", fontName.getName()));

    //remove F2 completely
    COSDictionary resources = res.getCOSObject();
    for(Entry<COSName, COSBase> resource : resources.entrySet()) {
        if(resource.getKey().equals(COSName.FONT)) {
            COSObject fonts = (COSObject)resource.getValue();
            COSDictionary fontDict = (COSDictionary)fonts.getObject();

            COSName toBeRemoved=null;
            for(Entry<COSName, COSBase> item : fontDict.entrySet()) {
                if(item.getKey().getName().equals("F2")) {
                    toBeRemoved = item.getKey();
                }
            }
            if(toBeRemoved!=null) {
                fontDict.removeItem(toBeRemoved);
            }
        }
    }

if(pDAcroForm.getNeedAppearances()) {
    pDAcroForm.refreshAppearances();
    pDAcroForm.setNeedAppearances(false);
}

//Flatten the document
pDAcroForm.flatten();

//Save the document
document.save("Form-Test-Result.pdf");
document.close();

Please note that the above code is quite static - Searching and replacing a font called F2 only works for the supplied PDF in other cases it won't. You have to implement a more generic solution for that...