Javatpoint Logo
Javatpoint Logo

Read PDF File in Java

Reading PDF file through a Java program is not the same as reading a text file. The way of reading a PDF file is a bit different. JDK does not provide any class to read PDF file. In order to read a PDF file, we depend on the third-party library. There are several third-party libraries are available to read a PDF file. So, in this section, we will use the Apache Tika library for reading a PDF file in Java. A generic API is provided by this library to parse files. To access and read the PDF file, we will use the following classes.

BodyContentHandler: It is a class that has been in-built in order to create for the text a handler that writes these characters events of these XHTML bodies and keeps them in the internal string buffer. The parent class of the BodyContentHandler class is the ContentHandlerDecorator class in Java.

PDFParser: Another in-built class provided by Java is the class PDFParser. It is the class that parses the contents present in the PDF files or documents. The class is responsible for extracting out the contents kept within tables, strings, and paragraphs (without calling the tabular boundaries). The PDFParser can also be used for parsing the encrypted files, provided the password is provided as an argument.

ParseContext: The ParseContext class is the part of the org.apache.tika.parser package that is used for parsing the context and then handing it to the Tika parsers.

Steps to Read a PDF File

Step 1: Create a content handler.

Step 2: Create a PDF file locally in the system one is using.

Step 3: Now, create a FileInputStream that has the same path where the created PDF file is residing.

Step 4: For the PDF file, create a content parser with the help of the metadata type object.

Step 5: The PDF parser class parses the PDF file.

Step 6: Display the content of the PDF file.

Implementation

The following program shows how to read and display the content of a PDF file.

FileName: ReadPDFFile.java

Let's execute the above program by using the following commands:

To Compile: javac -cp .;tika.jar PrimePointEx.java

To Run: java -cp .;tika.jar PrimePointEx

Output:

Extracting the contents from the file:

A Simple PDF File
This is a small demonstration .pdf file -

just for use in the Virtual Mechanics tutorials. More text. And more
text. And more text. And more text. And more text.

And less text. And less text. And less text. And less text. And less
text. And less text. interesting, zzzzz. And less text. And less text. And
less text. And less text. And less text. And less text. And less text.
And less text. And less text.

And less text. And less text. And less text. And less text. And less
text. And less text. And less text. Even less. Continued on page 2 ...



Simple PDF File 2
...continued from page 1. Yet less text. And less text. And less text.
And less text. And less text. And less text. And less text. And less
text. Oh, how interesting typing this stuff. But not as interesting as watching
paint dry. And less text. And less text. And less text. And less text.
interesting.  less, a little less text. The end, that can be well or can't be well. However, we need to keep moving. That is how one should live the life. The content of the file ends here

Note: Before executing the above program, ensure that you have imported the tika.jar file, else you will get compile-time or the run time error. Also update the classpath while doing the compilation or the execution process. Also note that we have used the sample.pdf file, you can put the name of the file that you want to read.







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA