Hot questions for Using PDFBox in eclipse

Question:

Trying to implement pdfbox in eclipse but I'm getting this error when I run it.

>Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

>   at org.apache.pdfbox.pdfparser.BaseParser.<clinit>(BaseParser.java:68)

>   at com.pdf.util.PDFTextParser.<init>(PDFTextParser.java:26)

>   at com.pdf.util.PDFTextParser.main(PDFTextParser.java:77)

>Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory

>   at java.net.URLClassLoader.findClass(Unknown Source)

>   at java.lang.ClassLoader.loadClass(Unknown Source)

>   at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)

>   at java.lang.ClassLoader.loadClass(Unknown Source)

>   ... 3 more

The program stops at this line of code:

parser = new PDFParser (new FileInputStream(file));

PDFParser comes from pdfbox.

I'm guessing there's something wrong with how I've attached the JAR files?

  • I moved all the jar files to a folder I created called "lib" which is part of the project.
  • Went into project Properties -> Java Build Path, and clicked "Add External JARs" for every JAR file
  • After doing this I noticed that it said "Source attachment: none" for each of the JARs, so I clicked edit and set the destination to its location in the lib folder.
  • When I go into Run Configuration, under Classpath, I can see the JAR files are there underneath my project.

Answer:

PDFBox requires Commons Logging (see this dependencies page from the project's website). You need to reference that Jar in the classpath along with the PDFBox Jar. If you use a build tool like Maven, it should automatically download it for your project.

Question:

I am using PDFbox as an external library in Java in eclipse, every time I use some class/method from PDFBox, a java execution window would appear, just like my program calls another java program (it is the same java window when I use PDFBox in terminal.)

But this does not happen when I use other libraries and I feel like this process slows down my program (maybe not true). And I just do not like it? Anyone has idea why this happens and how to control it?

See the rightmost icon? It appears every time I run my program with PDFbox involved.

Here is a piece of code I used to extract text,

    PDDocument document = PDDocument.load(file_name);
    PDFTextStripper stripper = new PDFTextStripper();
    int num_of_pages = document.getNumberOfPages();
    int begin_page = num_of_pages - (num_of_pages/5+1);
    stripper.setStartPage(begin_page); 
    String all_text = stripper.getText(document);

Answer:

I know this is ancient but just in case you're like me and Googling around trying to solve this, just update your configuration with the following under VM Options:

-Djava.awt.headless=true -Djava.awt.headlessLib=true

Question:

I am working on a plain Java project in eclipse juno using jre6/jdk6 as runtime/compiler. I wish to use apache pdfbox to generate some pdfs. i have downloaded and added pdfbox 1.8.9 to my build path. now i took a code sample from here, and used it in my application, but it is giving me multiple error which i think is related to some environment problems.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class TestPdf {

PDDocument document = new PDDocument();
// Create a new blank page and add it to the document
PDPage blankPage = new PDPage();
document.addPage( blankPage );
// Save the newly created document
document.save("BlankPage.pdf");
// finally make sure that the document is properly
// closed.
document.close();
}

These are the errors i am getting :

Syntax error on token "blankPage", VariableDeclaratorId expected after this token
Syntax error on token ""BlankPage.pdf"", delete this token
Syntax error on token "close", Identifier expected after this token

Answer:

You should create a method and move some of the code inside the method :

public class TestPdf {

    PDDocument document = new PDDocument();
    // Create a new blank page and add it to the document
    PDPage blankPage = new PDPage();

    public void createDocument()throws Exception {
        document.addPage(blankPage);
        // Save the newly created document
        document.save("BlankPage.pdf");
        // finally make sure that the document is properly
        // closed.
        document.close();
    }
}

The code that you posted in your question is against the syntax rules of the Java language. You can read more about the structure of a class here

Question:

I am using pdfbox-0.7.3.jar. I know missing related class files belongs to JAR pdfbox-0.7.3 but when i attach the source file. keep showing missing .class files. i am seeking for suggestions on the below error.

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import org.pdfbox.cos.COSDocument;
    import org.pdfbox.pdfparser.PDFParser;
    import org.pdfbox.pdmodel.PDDocument;
    import org.pdfbox.util.PDFTextStripper;
    import java.lang.NoClassDefFoundError;
    import java.util.Scanner;
        public class ggg{
        public static void main(String args[]) {
           // PDFTextStripper pdfStripper = null;
               // PDDocument pdDoc = null;
           // COSDocument cosDoc = null;
            File file = new File("C:\\Users\\firstfile.pdf");
            try {
                PDFParser parser = new PDFParser(new FileInputStream(file));
                parser.parse();
                COSDocument   cosDoc = parser.getDocument();
                PDFTextStripper   pdfStripper = new PDFTextStripper();
                PDDocument pdDoc = new PDDocument(cosDoc); 
                pdfStripper.setStartPage(1);
                pdfStripper.setEndPage(5);
                String parsedText = pdfStripper.getText(pdDoc);
                System.out.println(parsedText);
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } 
        }
    }
Exception in thread "main" java.lang.NoClassDefFoundError: org/fontbox/afm/FontMetric
    at org.pdfbox.pdmodel.font.PDFont.getAFM(PDFont.java:334)
    at org.pdfbox.pdmodel.font.PDSimpleFont.getFontHeight(PDSimpleFont.java:104)
    at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:336)
    at org.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:80)
    at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
    at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
    at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
    at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
    at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
    at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
    at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
    at ggg.main(ggg.java:30)

Answer:

Seems that you are not using any build tool.

Unfortunately, this library has additional dependencies.

org.fontbox.afm.FontMetric is a class that is located in fontbox-0.1.0.jar

You can go to Maven Central - PDF Box and download and add all libraries mentioned in dependencies to your project.

What else you can do is to setup a maven project. And add this dependency to your pom.xml. To do this you need:

  1. Install maven
  2. Create a project using maven command line command

    mvn -B archetype:generate \ -DarchetypeGroupId=org.apache.maven.archetypes \ -DgroupId=com.mycompany.app \ -DartifactId=my-app

  3. Add maven PDF dependency to pom.xml file to the section <dependendencies>

    <dependency> <groupId>pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>0.7.3</version> </dependency>

  4. Open your generated project as a Maven project inside your IDE (in your case it is Eclipse)

  5. Refresh project in IDE and let Eclipse download library with all dependencies for you.

Question:

I have these imports (among others):

import org.apache.pdfbox.*;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

I have this dependency in my pom.xml:

<dependency>
  <groupId>org.apache.pdfbox</groupId>
  <artifactId>pdfbox</artifactId>
  <version>2.0.4</version>
</dependency>

I see this line in my eclipse maven dependencies:

pdfbox-2.0.4.jar - C:\Users\Paul\.m2\repository\org\apache\pdfbox\pdfbox\2.0.4\pdfbox-2.0.4.jar

I check the build path in eclipse, and see pdfbox-2.0.4.jar in the Maven Dependencies part.

I run mvn clean compile in a command prompt (Windows).

I get the error "package org.apache.pdfbox does not exist"

I run mvn dependency:build-classpath -Dmdep.outputFile=cp.txt

The following lines are listed in the class path (at the front of the class path):

C:\Users\Paul\.m2\repository\org\apache\pdfbox\pdfbox\2.0.4\pdfbox-2.0.4.jar;
C:\Users\Paul\.m2\repository\org\apache\pdfbox\fontbox\2.0.4\fontbox-2.0.4.jar;

I look in C:\Users\Paul.m2\repository\org\apache\pdfbox\pdfbox\2.0.4\ and I see pdfbox-2.0.4.jar

So what am I missing? Why is the pdfbox jar not being found?


Answer:

remove this line:

import org.apache.pdfbox.*;

because that package does indeed not exist. The other ones (with deeper levels) are OK.