The pdf document is becoming increasingly popular among ordinary people, businesses, and organizations. There are a lot of benefits to using this kind of format, but it also comes with its own flaws. Nowadays, fraudsters have figured out a way to exploit the readability and interoperability of pdf documents. For example, they could alter the original text of a document and change financial information in order to embezzle millions of dollars worth of assets. Sometimes, you might not be able to tell if two pdf documents are different or not since visually they look the same. There is no need to worry because there are solutions that can help you solve this problem. In this article, we will try to find a simple solution to help you compare two pdf documents online for free and thoroughly examine how to compare two pdf files in java as well as what is the easiest way to compare two pdf files for changes for business purposes.
Compare two PDF files or two portions of the same file for differences (pdfbox). You can extract text from pdf files. You can compare that text with your database to see if there are any changes. This can be a good way to check a pdf file for malicious content, or just as a way to back up information in case it were to become corrupted.
Is there a way to programmatically compare two PDF files, so that it returns some sort of percentage match? I’m mostly interested in comparing text content, but if it can compare images as well it would be even better. I’m using java, but if this functionality is available as some sort of standalone package I could run as part of a process.
Sometimes you need to compare two or more PDF files for a variety of reasons (for example, you want to make sure that two different versions of an invoice look the same; perhaps you’d like to check if two documents contain the same information). One of the ways to do it is to use PDFBox library, which has some abilities to work with PDF. In this article I would like to give a step by step comparison between two pdf files in java, then you can easily how to compare pdfs in java. Use this tutorial to learn how to compare two pdf files in Java. Many programs can open, display, and perform other functions on pdf files; but very few allow you to compare them. Here is a simple solution that allows both text and binary comparison to bring out the differences between two pdf files in Java. Can you spot the differences?
Java Document Comparison API
As a pre-requisite, you may get GroupDocs.Comparison for Java from the downloads section. Also, you can just add the following in your pom.xml in case of maven based applications:
Repository
- <repository>
- <id>GroupDocsJavaAPI</id>
- <name>GroupDocs Java API</name>
- <url>http://repository.groupdocs.com/repo/</url>
- </repository>
Dependency
- <dependency>
- <groupId>com.groupdocs</groupId>
- <artifactId>groupdocs-comparison</artifactId>
- <version>20.4</version>
- </dependency>
Compare Word Files and Show Differences using Java
Steps below will show you to compare any two Word documents in just a few lines of Java code. As a result, you will get the resultant document that will be highlighting the identified changes.
- Initialize the Comparer object with the source document path.
- Add the second document to compare using the add method.
- Call the compare method to get the result of the comparison. The compare method takes the name of the output document as a parameter.
- // Compare two Word files from the provided location on disk
- Comparer comparer = new Comparer(“source.docx”);
- try {
- comparer.add(“target.docx”);
- comparer.compare(“comparison.docx”);
- }
- finally {
- comparer.dispose();
- }
Here I am displaying the resultant Word document generated by the above code, and it contains the highlighted differences of the compared two Word documents. The deleted content will be marked in RED, added content will be displayed in Blue, however, Green shows the modified content.
Compare Word Files for Text using Stream
You can similarly pass the document as a stream to the Comparer class to get it compared with the second document. Here is the Java code to give you a clear idea:
- // Compare two Word file using Stream
- Comparer comparer = new Comparer(new FileInputStream(“source.docx”));
- try {
- comparer.add(new FileInputStream(“target.docx”));
- comparer.compare(new FileOutputStream(“result.docx”));
- }
- finally {
- comparer.dispose();
- }
Accept or Reject the Compared Changes in Word File using Java
After successfully highlighting the identified differences, you have the option to either accept or reject any change. Just to show as an example, I am accepting and rejecting the changes alternatively. You may display each change one by one with the similar code and take your decisions to accept/reject each change according to your requirement.
- // Accept or Reject the identified changes of Word document in Java
- Comparer comparer = new Comparer(source);
- try {
- comparer.add(target);
- comparer.compare();
- ChangeInfo[] changes = comparer.getChanges();
- System.out.println(“changes.length: ” + changes.length + “.”);
- // Accept or Reject the changes
- for (int n = 0; n < changes.length; n++) {
- if (n % 2 == 0) {
- changes[n].setComparisonAction(ComparisonAction.ACCEPT);
- }
- else {
- changes[n].setComparisonAction(ComparisonAction.REJECT);
- }
- }
- // Apply your decisions to get the resultant document.
- comparer.applyChanges(outputFileName, new SaveOptions(), new ApplyChangeOptions(changes));
- }
- finally {
- comparer.dispose();
- }
Compare Text Files and Show Differences using Java
Using the Comparer class, we can also compare any text file. Below is the similar code for comparing two text files in Java. Steps are exactly the same as comparing any other two documents:
- Start with passing the text file to the Comparer class.
- Add the second file using the add method.
- Call the compare method.
- // Compare two text files to identify and highlight changes.
- Comparer comparer = new Comparer(“source.txt”);
- try {
- comparer.add(“target.txt”);
- comparer.compare(“comparison.txt”);
- }
- finally {
- comparer.dispose();
- }
Here is the output document that shows the comparison result of matching two text files using the above code.
Compare PDF Files for Text Difference using Java
We can compare the PDF files using the same above code, and by just changing the file extensions to “.pdf”. Just to mention, the code below compare two pdf files and shows differences in Java.
- // Compare two PDF file using Stream
- Comparer comparer = new Comparer(new FileInputStream(“source.pdf”));
- comparer.add(new FileInputStream(“target.pdf”));
- comparer.compare(new FileOutputStream(“result.pdf”));
Below is the outcome after comparing the PDF files.
How to Compare PDF Files Using Java
In order to compare PDF files, we’ll use Aspose.Words for Java API which is a feature-rich, powerful and easy to use comparison API for Java platform. You can download its latest version directly from Maven and install it within your Maven-based project by adding the following configurations to the pom.xml.
Repository
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
Dependency
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>version of aspose-words API</version>
<classifier>jdk17</classifier>
</dependency>
Steps for Comparing PDF Files in Java
Developers can easily integrate code to compare two versions of a document to get the difference.
- Load two PDF files with full path for comparison.
- Assign one file as the base file.
- Use getRevisions() with count for differences.
System Requirements
Before integrating the code, make sure that you have the following prerequisites.
- Microsoft Windows or a compatible OS with Java Runtime Environment for JSP/JSF Application and Desktop Applications.
- Get latest version of Aspose.Words for Java directly from Maven .
Compare PDF Files – Java
Document docA = new Document(dataDir + "DocumentA.pdf");
Document docB = new Document(dataDir + "DocumentB.pdf");
docA.compare(docB, "user", new Date());
if (docA.getRevisions().getCount() == 0)
System.out.println("Documents are equal");
else
System.out.println("Documents are not equal");
Java Words API can be used to load, view and convert Microsoft Word and OpenDocument Formats like DOC, DOCX, ODT to PDF, XPS, HTML and various other formats. You can also create new documents from scratch and save them in the supported formats. It is a standalone API that is suitable for server side and backend systems where high performance is required. It does not depend on any software like Microsoft or OpenOffice.
jPDFEditor has a Side by Side PDF Comparison tool that allows you to open two PDF documents in the same window next to each other to compare visually. This can be useful when comparing PDF documents that may have text or image revisions.
Note: This feature may not be available depending on what has been enabled/disabled in your distribution. See the specific documentation of your application for additional instructions.
Instructions:
- Open the initial or first document in jPDF Editor. This will become Document A in the Side by Side comparison
- Start the Side by Side comparison mode
- Note: “See the specific documentation for your application for detailed instructions”
- Select the second document that you wish to compare with Document A. This will become Document B in the Side by Side comparison
- You will now see Document A (on the left) and Document B (on the right) opened side by side in the same jPDF Editor window.
- At any time during the comparison, using the toolbar, you can only annotate or markup any changes onto Document A.
- You can markup Document B by right clicking on Document B and selecting the tool that you wish to use. The tools available for document B are:
- Sticky Note
- Pencil
- Select Text
- Once the text is selected you can right click on the highlighted and choose a text markup action.
- If you would like you can switch over to our Overlay Comparison mode by clicking on the Overlay button overlay in the top right corner of the above Document B.
Note: When switching over to Overlay Comparison if any changes have been made to Document B they will need to first be saved. jjPDF Editor will prompt you to do so when switching to Overlay Comparison.
It is quite a frequent question on the PDF forums, asking how to compare 2 versions of a PDF file to see what has changed. This is actually one of those cases where generally the person means something slightly different.
Usually, this means ‘how can I see what has changed visually‘. PDF is a flexible file format in which you can do things in many different ways. So you could create 2 different PDF versions of a file using Acrobat and Ghostscript (as an example). The files would (hopefully) be identical. But the files would be different sizes and the internal structure of each would be very different.
As part of developing a PDF library, we want to do an awful lot of regression testing to make sure that we do not break anything. So we need to compare a lot of files. We also like to test each change individually so we can investigate any problems.
So the way we compare PDF files is to extract the text and to convert the PDF to a png. Here is the Java code we use. We compare this against a baseline. You still need a human to verify any changes, but it does provide very quick regression tests. If the results are identical, we can be confident that the file has not changed. And doing the same with 2 PDF files allows you to quickly review and changes, especially if you get the comparison to highlight the area on the PNG which has changed.
We find that a very good way to compare PDF file results. What works for you?
This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have over 22 years worth of PDF knowledge and tips, so click here to visit our series index!
Conclusion
There is a simple and free comparing tool named Compare PDF. You can not only compare pdf files, but also perform addition and subtraction operations between two pdf files. The advanced version of the web-based tool, Pdf Diff Software can help you compare multiple pdf files, merge or watermark them easily.
When it comes to comparing files, we have automated tools that help us. However, some files might be beyond their capabilities. That’s fine because we can write our own tools to achieve the same objective. They offer the same functionality as well but allow us to handle them in any way we think is best suited. When it comes to comparing pdf files, you can use third-party software or you can write your own tool in Java.