Hot questions for Using PDFBox in acrofields

Top Java Programmings / PDFBox / acrofields

Question:

I need help with adding Cyrillic value to a field using the PDFBox API. Here is what I have so far:

PDDocument document = PDDocument.load(file);
PDDocumentCatalog dc = document.getDocumentCatalog();
PDAcroForm acroForm = dc.getAcroForm();
PDField naziv = acroForm.getField("naziv");
naziv.setValue("Наслов"); // this part right here
naziv.setValue("Naslov"); // it works like this

It works perfect when my input is in Latin Alphabet. But I need to handle Cyrillic inputs as well. How can I do it?

p.s. this is the exception I get: Caused by: java.lang.IllegalArgumentException: U+043D ('afii10079') is not available in this font Helvetica encoding: WinAnsiEncoding


Answer:

The code below adds an appropriate font in the acroform default resource dictionary, and replaces the name in the default appearances. PDFBox recreates the appearance stream of the fields using the new font when you call setValue().

public static void main(String[] args) throws IOException
{
    PDDocument doc = PDDocument.load(new File("ZPe.pdf"));
    PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
    PDResources dr = acroForm.getDefaultResources();

    // Important: the font is Type0 (allows more than 256 glyphs) and NOT SUBSETTED
    PDFont font = PDType0Font.load(doc, new FileInputStream("c:/windows/fonts/arial.ttf"), false);

    COSName fontName = dr.add(font);
    Iterator<PDField> it = acroForm.getFieldIterator();
    while (it.hasNext())
    {
        PDField field = it.next();
        if (field instanceof PDTextField)
        {
            PDTextField textField = (PDTextField) field;
            String da = textField.getDefaultAppearance();

            // replace font name in default appearance string
            Pattern pattern = Pattern.compile("\\/(\\w+)\\s.*");
            Matcher matcher = pattern.matcher(da);
            if (!matcher.find() || matcher.groupCount() < 2)
            {
                // oh-oh
            }
            String oldFontName = matcher.group(1);
            da = da.replaceFirst(oldFontName, fontName.getName());

            textField.setDefaultAppearance(da);
        }
    }
    acroForm.getField("name1").setValue("Наслов");
    doc.save("result.pdf");
    doc.close();
}

Update 4.4.2019: to save some space, it may be useful to remove the appearance before calling setValue:

acroForm.getField("name1").getWidgets().get(0).setAppearance(null);

to check whether there are unused fonts in the AcroForm default resources, see this answer.

Update 7.4.2019: you may experience poor performance if the font is very large (e.g. ArialUni) and many fields are to be set (PDFBOX-4508). In that case, save and reload the file before calling setValue.

To find out whether a font supports an intended text, call PDFont.encode() and check for IllegalArgumentException.

Question:

PDFBox setValue() is not setting data for each PDTextField. It is saving few fields. It is not working for fields which have similar appearance in getFullyQualifiedName().

Note: field.getFullyQualifiedName() { customdutiesa, customdutiesb, customdutiesc } it is working for customdutiesa, but not working for customdutiesb and customdutiesc etc...

@Test
public void testb3Generator() throws IOException {
    File f = new File(inputFile);

    outputFile = String.format("%s_b3-3.pdf", "123");

    try (PDDocument document = PDDocument.load(f)) {

        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDAcroForm acroForm = catalog.getAcroForm();
        int i = 0;
        for (PDField field : acroForm.getFields()) {
            i=i+1;
            if (field instanceof PDTextField) {
                PDTextField textField = (PDTextField) field;
                textField.setValue(Integer.toString(i));
            }
        }

        document.getDocumentCatalog().getAcroForm().flatten();

        document.save(new File(outputFile));
        document.close();
    }
    catch (Exception e) {

        e.printStackTrace();
    }
}

Input pdf link : https://s3-us-west-2.amazonaws.com/kx-filing-docs/b3-3.pdf Ouput pdf link : https://kx-filing-docs.s3-us-west-2.amazonaws.com/123_b3-3.pdf


Answer:

The problem is that under certain conditions PDFBox does not construct appearances for fields it sets the value of, and, therefore, during flattening completely forgets the field content:

// in case all tests fail the field will be formatted by acrobat
// when it is opened. See FreedomExpressions.pdf for an example of this.  
if (actions == null || actions.getF() == null ||
    widget.getCOSObject().getDictionaryObject(COSName.AP) != null)
{
    ... generate appearance ...
}

(org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(String))

I.e. if there is a JavaScript action for value formatting associated with the field and no appearance stream is yet present, PDFBox assumes it does not need to create an appearance (and probably would do it wrong anyways as it does not use that formatting action).

In case of a use case later flattening the form, that assumption of PDFBox obviously is wrong.

To force PDFBox to generate appearances for those fields, too, simply remove the actions before setting field values:

if (field instanceof PDTextField) {
    PDTextField textField = (PDTextField) field;
    textField.setActions(null);
    textField.setValue(Integer.toString(i));
}

(from FillAndFlatten test testLikeAbubakarRemoveAction)

Question:

I am trying to validate a self created PDF file against the PDF/A-1b specification but I am getting below errors (For the validation I used the Apache PDFBox Preflight library. The version for Apache PDFBox and Preflight is 2.0.15)

3.1.1 : Invalid Font definition, Helvetica: some required fields are missing from the Font dictionary: firstChar, lastChar, widths.

3.1.3 : Invalid Font definition, Helvetica: FontFile entry is missing from FontDescriptor

3.1.1 : Invalid Font definition, ZapfDingbats: some required fields are missing from the Font dictionary: firstChar, lastChar, widths.

3.1.3 : Invalid Font definition, ZapfDingbats: FontFile entry is missing from FontDescriptor

7.11.1 : Error on MetaData

How can I overcome above problems. Thank you in advance

PDResources resources = new PDResources();
resources.put(COSName.getPDFName("Helv"), 
pdfPage.getText1Font());
String deafultAppearance = "/Helv 12 Tf 0 g";

form.setDefaultResources(resources);
form.setDefaultAppearance(deafultAppearance);
pdDocument.getDocumentCatalog().setAcroForm(form);


   metadata.createAndAddPDFAExtensionSchemaWithDefaultNS(); 
 metadata.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/schema#", "pdfaSchema");
                 metadata.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/property#", "pdfaProperty");
                metadata.getPDFExtensionSchema().addNamespace("http://www.aiim.org/pdfa/ns/id/", "pdfaid");
    XMPSchema uaSchema = new XMPSchema(XMPMetadata.createXMPMetadata(),
                        "pdfaSchema", "pdfaSchema", "pdfaSchema");
    uaSchema.setTextPropertyValue("schema", "PDF/A Accessibility Schema");
    uaSchema.setTextPropertyValue("namespaceURI", "http://www.aiim.org/pdfa/ns/id/");
                uaSchema.setTextPropertyValue("prefix", "pdfaid");
    XMPSchema uaProp = new XMPSchema(XMPMetadata.createXMPMetadata(),
                        "pdfaProperty", "pdfaProperty", "pdfaProperty");
    uaProp.setTextPropertyValue("name", "part");
    uaProp.setTextPropertyValue("valueType", "Integer");
    uaProp.setTextPropertyValue("category", "internal");
    uaProp.setTextPropertyValue("description", "Indicates, which part of ISO 14289 standard is followed");
    uaSchema.addUnqualifiedSequenceValue("property", uaProp);
    metadata.getPDFExtensionSchema().addBagValue("schemas", uaSchema);
    metadata.getPDFExtensionSchema().setPrefix("pdfaid");
    metadata.getPDFExtensionSchema().setTextPropertyValue("part", "1");

Answer:

The font related messages are because you used the standard 14 type 1 font objects, e.g. PDType1Font.HELVETICA. PDF/A-1b requires all fonts to be embedded. Thus use PDType0Font.load() to load your fonts. For acroform fields, make sure to use a method that has the third parameter false to prevent subsetting.

The XMP related messages are because you forgot to set the conformance to "B". See also CreatePDFA.java in the examples subproject of the source code download.