Merge PDF Files in Selenium with Java

Last Updated On

April 28, 2025

In this article we will see how we can see how we will merge 2 PDF files in a single PDF file using selenium Java.

Merging PDF documents in Selenium using Java requires some additional libraries because Selenium itself does not provide direct support for reading PDFs. The most commonly used library for reading PDFs in Java is Apache PDFBox.

Table of Contents

Sample PDF File1

Sample PDF File2

1. Add the dependencies

  <dependency>
      <groupId>org.seleniumhq.selenium</groupId>
      <artifactId>selenium-java</artifactId>
      <version>4.24.0</version>
    </dependency>

    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>2.16.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache.pdfbox</groupId>
      <artifactId>pdfbox</artifactId>
      <version>3.0.3</version>
    </dependency>

2. WebDriver SetUp and navigate to the page

Configuring WebDriver for Chrome and setting ChromeOptions for downloading PDFs automatically without prompts.

 String downloadFilepath = System.getProperty("user.dir") + File.separator + "downloads";

        ChromeOptions options = new ChromeOptions();
        Map<String, Object> prefs = new HashMap<>();
        prefs.put("plugins.always_open_pdf_externally", true);
        prefs.put("download.default_directory", downloadFilepath);
        options.setExperimentalOption("prefs", prefs);

        WebDriver driver = new ChromeDriver(options);
        driver.manage().window().maximize();
        driver.get("https://freetestdata.com/document-files/pdf/");

3. Download the pdfs

We are using the Apache PDFBox to download PDF files.

We are using WebDriverWait to wait until the pdfs are downloaded.

// Download first PDF
    WebElement downloadLink1 = driver.findElement(By.xpath("//*[@class='elementor-button-text']"));
    downloadLink1.click();

//Wait for first PDF download to complete
   File downloadedFile1 = new File(downloadFilepath + "/Free_Test_Data_100KB_PDF.pdf");    
   WebDriverWait wait1 = new WebDriverWait(driver, Duration.ofSeconds(30));
   wait1.until((ExpectedCondition<Boolean>) wd -> downloadedFile1.exists());
    System.out.println("PDF file1 is downloaded successfully.");

// Download second PDF
    WebElement downloadLink2 = driver.findElement(By.xpath("//*[@id=\"post-81\"]/div/div/section[3]/div/div[1]/div/section[2]/div/div[2]/div/div/div/div/a/span/span"));
    downloadLink2.click();

//Wait for first PDF download to complete
    File downloadedFile2 = new File(downloadFilepath + "/260KB.pdf");
   WebDriverWait wait2 = new WebDriverWait(driver, Duration.ofSeconds(30));
    wait2.until((ExpectedCondition<Boolean>) wd -> downloadedFile2.exists());
    System.out.println("PDF file2 is downloaded successfully.");

4. Merging the PDFs

We are using PDFMergerUtility from Apache PDFBox to merge the downloaded PDF files into a single PDF.

 public static void mergePDFFiles(String pdf1Path, String pdf2Path, String mergedPdfPath) throws IOException {
        PDFMergerUtility pdfMerger = new PDFMergerUtility();

        pdfMerger.addSource(new File(pdf1Path));
        pdfMerger.addSource(new File(pdf2Path));
        pdfMerger.setDestinationFileName(mergedPdfPath);

        // Merge PDFs
        pdfMerger.mergeDocuments(null);

    }

Step 1 – Creates an instance of PDFMergerUtility which is a utility class in Apache PDFBox for merging multiple PDF files.

PDFMergerUtility pdfMerger = new PDFMergerUtility();

Step 2 – Adds the first PDF file (pdf1Path) and the second PDF file (pdf2Path) as sources to the PDFMergerUtility instance.

pdfMerger.addSource(new File(pdf1Path));
pdfMerger.addSource(new File(pdf2Path));

Step 3 – setDestinationFileName is a method provided by PDFMergerUtility to set the output file’s path.

 pdfMerger.setDestinationFileName(mergedPdfPath);

Step 4 – The mergeDocuments(COSLoadOptions) method performs the merging operation. The parameter null signifies that default load options are used.

pdfMerger.mergeDocuments(null);

5. Delete old PDF files

Verify if each file exists before attempting to delete it and printing a success or failure message based on the deletion attempt.

public static void deleteOldPDFFiles(File... files) {
        for (File file : files) {
            if (file.exists()) {
                if (file.delete()) {
                    System.out.println(file.getName() + " was deleted successfully.");
                } else {
                    System.out.println("Failed to delete " + file.getName() + ".");
                }
            }
        }
    }

6. Quit the browser

Quit the browser after all the operations are finished to free up the resources.

driver.quit();

The entire program can be seen below:

package com.example;

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;
import java.io.File;
import java.io.IOException;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

public class PDFMerge_Demo {

    public static void main(String[] args) throws InterruptedException, IOException {

        String downloadFilepath = System.getProperty("user.dir") + File.separator + "merge_downloads";

        ChromeOptions options = new ChromeOptions();
        Map<String, Object> prefs = new HashMap<>();
        prefs.put("plugins.always_open_pdf_externally", true);
        prefs.put("download.default_directory", downloadFilepath);
        options.setExperimentalOption("prefs", prefs);

        WebDriver driver = new ChromeDriver(options);
        driver.manage().window().maximize();
        driver.get("https://freetestdata.com/document-files/pdf/");

        // Download first PDF
        WebElement downloadLink1 = driver.findElement(By.xpath("//*[@class='elementor-button-text']"));
        downloadLink1.click();

        //Wait for first PDF download to complete
        File downloadedFile1 = new File(downloadFilepath + "/Free_Test_Data_100KB_PDF.pdf");
        WebDriverWait wait1 = new WebDriverWait(driver, Duration.ofSeconds(30));
        wait1.until((ExpectedCondition<Boolean>) wd -> downloadedFile1.exists());
        System.out.println("PDF file1 is downloaded successfully.");

        // Download second PDF
        WebElement downloadLink2 = driver.findElement(By.xpath("//*[@id=\"post-81\"]/div/div/section[3]/div/div[1]/div/section[2]/div/div[2]/div/div/div/div/a/span/span"));
        downloadLink2.click();

        //Wait for first PDF download to complete
        File downloadedFile2 = new File(downloadFilepath + "/260KB.pdf");
        WebDriverWait wait2 = new WebDriverWait(driver, Duration.ofSeconds(30));
        wait2.until((ExpectedCondition<Boolean>) wd -> downloadedFile2.exists());
        System.out.println("PDF file2 is downloaded successfully.");

        String pdf1Path = downloadedFile1.getAbsolutePath();
        String pdf2Path = downloadedFile2.getAbsolutePath();

        //Check if PDF files exists
        if (downloadedFile1.exists() && downloadedFile2.exists()) {

            // Merge the PDF files
            mergePDFFiles(pdf1Path, pdf2Path, downloadFilepath + "/Merged_PDF.pdf");

            // Print success message
            System.out.println("PDF files merged successfully.");

            // Delete old PDFs
            deleteOldPDFFiles(downloadedFile1, downloadedFile2);

            // Print success message
            System.out.println("Old PDF files are deleted successfully.");
        } else {
            System.out.println("One or both of the PDF files are missing.");
        }

        // Close the browser
        driver.quit();

    }

    public static void mergePDFFiles(String pdf1Path, String pdf2Path, String mergedPdfPath) throws IOException {
        PDFMergerUtility pdfMerger = new PDFMergerUtility();

        pdfMerger.addSource(new File(pdf1Path));
        pdfMerger.addSource(new File(pdf2Path));
        pdfMerger.setDestinationFileName(mergedPdfPath);

        // Merge PDFs
        pdfMerger.mergeDocuments(null);

    }

    public static void deleteOldPDFFiles(File... files) {
        for (File file : files) {
            if (file.exists()) {
                if (file.delete()) {
                    System.out.println(file.getName() + " was deleted successfully.");
                } else {
                    System.out.println("Failed to delete " + file.getName() + ".");
                }
            }
        }
    }

}

The output of the above program is

We can see that the merged pdf is placed in the merge_downloads folder.

Congratulations on making it through this tutorial and hope you found it useful! Happy Learning!!

September 27, 2024 vibssingh

How to Write in PDF with Selenium and Java

HOME

In this article we will see how we can write in a pdf file using selenium Java.

Writing in a PDF document in Selenium using Java requires some additional libraries because Selenium itself does not provide direct support for reading PDFs. The most commonly used library for reading PDFs in Java is Apache PDFBox.

Table of Contents

Below is an example of sample PDF we will use.

Sample PDF File

1. Add the dependencies

  <dependency>
      <groupId>org.seleniumhq.selenium</groupId>
      <artifactId>selenium-java</artifactId>
      <version>4.24.0</version>
    </dependency>

    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>2.16.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache.pdfbox</groupId>
      <artifactId>pdfbox</artifactId>
      <version>3.0.3</version>
    </dependency>

2. Setup ChromeOptions and open the browser

This includes setting the browser to always download PDFs instead of opening them in the browser, and specifying the default download directory.

String downloadFilepath = System.getProperty("user.dir") + File.separator + "downloads";

ChromeOptions options = new ChromeOptions();
Map<String, Object> prefs = new HashMap<>();
prefs.put("plugins.always_open_pdf_externally", true);
prefs.put("download.default_directory", downloadFilepath);
options.setExperimentalOption("prefs", prefs);

WebDriver driver = new ChromeDriver(options);
driver.manage().window().maximize();
driver.get("https://freetestdata.com/document-files/pdf/");

3. Download the PDF Document

Use Selenium WebDriver to navigate to the PDF URL and download it to a desired location.

  // Download first PDF
   WebElement downloadLink = driver.findElement(By.xpath("//*[@class='elementor-button-text']"));
   downloadLink.click();

  //Wait for PDF download to complete
  File downloadedFile = new File(downloadFilepath + "/Free_Test_Data_100KB_PDF.pdf");
  WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(30));
  wait.until((ExpectedCondition<Boolean>) wd -> downloadedFile.exists());
  System.out.println("PDF File is downloaded successfully.");

4. Create a content stream to write to the PDF

We are using the Apache PDFBox to write to the downloaded PDF file.

Step 1 – PDPageContentStream class is used to insert data in the document. In this class, we need to pass the document object and page object as its parameter to insert data.

PDPageContentStream contentStream = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true);

Step 2 – When we insert text in the PDF document, we can also provide the start position of the text. beginText() method of the PDPageContentStream class is used to start the text content.

contentStream.beginText();

Step 3 – We can set the font style and font size of the text by using setFont() method of the PDPageContentStream class.

//Setting the font to the Content stream
PDFont pdfFont=  new PDType1Font(TIMES_BOLD_ITALIC);
contentStream.setFont(pdfFont, 20);

Step 4 – We can set the position of the text by using newLineAtOffset() method of the PDPageContentStream class which can be shown in the following code.

//Setting the position for the line
contentStream.newLineAtOffset(40, 450);

Step 5 – We can insert text content in the PDF document by using the showText() method of the PDPageContentStream class.

//Adding text in the form of string
String text = "Hi!!! Added text to the existing PDF document.";
contentStream.showText(text);

Step 6 – When we insert text in the PDF document, we have to provide the end point of the text. endText() method of the PDPageContentStream class is used to end the text content.

contentStream.endText();

Step 7 – We can close the PDPageContentStream class by using close() method.

//Closing the content stream
contentStream.close();

Step 8 – After adding the required document, we have to save it to our desired location. save() method is used to save the document.

//Saving the document
doc.save(new File("downloads/Updated_PDF.pdf"));

Step 9 – After completing the task, we need to close the PDDocument class object by using the close() method.

//Closing the document
doc.close();

The complete program can be seen below:

package com.example;

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;

import java.io.File;
import java.io.IOException;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

import static org.apache.pdfbox.pdmodel.font.Standard14Fonts.FontName.TIMES_BOLD_ITALIC;

public class WritePDF_Chrome_Demo {

    public static void main(String[] args) throws InterruptedException, IOException {

        String downloadFilepath = System.getProperty("user.dir") + File.separator + "downloads";

        ChromeOptions options = new ChromeOptions();
        Map<String, Object> prefs = new HashMap<>();
        prefs.put("plugins.always_open_pdf_externally", true);
        prefs.put("download.default_directory", downloadFilepath);
        options.setExperimentalOption("prefs", prefs);

        WebDriver driver = new ChromeDriver(options);
        driver.manage().window().maximize();
        driver.get("https://freetestdata.com/document-files/pdf/");

        // Download first PDF
        WebElement downloadLink = driver.findElement(By.xpath("//*[@class='elementor-button-text']"));
        downloadLink.click();

        //Wait for PDF download to complete
        File downloadedFile = new File(downloadFilepath + "/Free_Test_Data_100KB_PDF.pdf");
        WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(30));
        wait.until((ExpectedCondition<Boolean>) wd -> downloadedFile.exists());
        System.out.println("PDF File is downloaded successfully.");

        driver.quit();

        //Retrieving the pages of the document
        PDDocument doc = Loader.loadPDF(downloadedFile);
        PDPage page = doc.getPage(2);
        PDPageContentStream contentStream = getPdPageContentStream(doc, page);
        System.out.println("New Text Content is added in the PDF Document.");

        //Closing the content stream
        contentStream.close();

        //Saving the document
        doc.save(new File("downloads/Updated_PDF.pdf"));

        //Closing the document
        doc.close();
    }

    private static PDPageContentStream getPdPageContentStream(PDDocument doc, PDPage page) throws IOException {

        PDPageContentStream contentStream = new PDPageContentStream(doc, page, PDPageContentStream.AppendMode.APPEND, true);

        //Begin the Content stream
        contentStream.beginText();

        //Setting the font to the Content stream
        PDFont pdfFont=  new PDType1Font(TIMES_BOLD_ITALIC);
        contentStream.setFont(pdfFont, 20);

        //Setting the position for the line
        contentStream.newLineAtOffset(10, 450);

        String text = "Hi!!! Added text to the existing PDF document.";

        //Adding text in the form of string
        contentStream.showText(text);

        //Ending the content stream
        contentStream.endText();

        return contentStream;
    }

}

The output of the above program is

We can see that the updated pdf is placed in the documents folder.

We can see that the text content is added to the PDF document.

Summary:

Setup WebDriver: Configure the browser to handle automatic downloads.
Trigger Download: Navigate to the webpage and trigger the download.
Wait for Completion: Implement a waiting mechanism to ensure the download completes.
Write to PDF: Use PDPageContentStream to write the data to PDF.

That’s it! Congratulations on making it through this tutorial and hope you found it useful! Happy Learning!!

September 25, 2024September 25, 2024 vibssingh

Read PDF Files with Selenium in Java

HOME

In this article we will see how we can read a pdf file using selenium java.

Organizations frequently generate various types of PDF reports, such as mobile bills, electricity bills, financial reports, and revenue reports. Quality Assurance (QA) teams are then tasked with verifying the information contained in these reports. Typically, this process involves manually downloading the reports and reading the data they contain. To automate this process, the test framework must be capable of automatically downloading PDF reports and extracting the data without any human intervention.

Reading a PDF document in Selenium using Java requires some additional libraries because Selenium itself does not provide direct support for reading PDFs. The most commonly used library for reading PDFs in Java is Apache PDFBox.

Table of Contents

Sample PDF File

1. Add the dependencies

Add the Selenium, commons and pdfbox dependencies to the project. To download the latest version of these dependencies, refer to the official Maven site – https://mvnrepository.com/.

  <dependency>
      <groupId>org.seleniumhq.selenium</groupId>
      <artifactId>selenium-java</artifactId>
      <version>4.24.0</version>
    </dependency>

    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>2.16.1</version>
    </dependency>

    <dependency>
      <groupId>org.apache.pdfbox</groupId>
      <artifactId>pdfbox</artifactId>
      <version>3.0.3</version>
    </dependency>

2. Download the PDF Document

Use Selenium WebDriver to navigate to the PDF URL and download it to a desired location.

 String downloadFilepath = System.getProperty("user.dir") + File.separator + "downloads";

        ChromeOptions options = new ChromeOptions();
        Map<String, Object> prefs = new HashMap<>();
        prefs.put("plugins.always_open_pdf_externally", true);
        prefs.put("download.default_directory", downloadFilepath);
        options.setExperimentalOption("prefs", prefs);

        WebDriver driver = new ChromeDriver(options);
        driver.manage().window().maximize();
        driver.get("https://freetestdata.com/document-files/pdf/");

        // Locate and click the download link or button if necessary
        WebElement downloadLink = driver.findElement(By.xpath("//*[@class=\"elementor-button-text\"]"));
        downloadLink.click();

        //Wait for download to complete
        File downloadedFile = new File(downloadFilepath + "/Free_Test_Data_100KB_PDF.pdf");
        WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(30));
        wait.until((ExpectedCondition<Boolean>) wd -> downloadedFile.exists());

        // Check if the file exists
        if (downloadedFile.exists()) {
            System.out.println("File is downloaded!");
        } else {
            System.out.println("File is not downloaded.");
        }

To know more about the PDF download, please refer to this tutorial – Download PDF in Chrome with Selenium Java

3. Read the PDF Content

We are using the Apache PDFBox to read the downloaded PDF file and extract text.

Step 1 – Load PDF Document

File file = new File("Path of Document");   
PDDocument doc = Loader.loadPDF(file);

Step 2 – Retrieve the text

PDFTextStripper class is used to retrieve text from a PDF document. We can instantiate this class as following

PDFTextStripper pdfStripper = new PDFTextStripper();

getText() method is used to read the text contents from the PDF document. In this method, we need to pass the document object as a parameter.

String text = pdfStripper.getText(doc);

The complete program can be seen below:

import org.apache.pdfbox.Loader;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;

import java.io.File;
import java.io.IOException;
import java.time.Duration;
import java.util.HashMap;
import java.util.Map;

public class ReadPDF_Chrome_Demo {

    public static void main(String[] args) throws InterruptedException {

        String downloadFilepath = System.getProperty("user.dir") + File.separator + "chrome_downloads";

        ChromeOptions options = new ChromeOptions();
        Map<String, Object> prefs = new HashMap<>();
        prefs.put("plugins.always_open_pdf_externally", true);
        prefs.put("download.default_directory", downloadFilepath);
        options.setExperimentalOption("prefs", prefs);

        WebDriver driver = new ChromeDriver(options);
        driver.manage().window().maximize();
        driver.get("https://freetestdata.com/document-files/pdf/");

        // Locate the download link/button and click and wait for the download to complete
        WebElement downloadLink = driver.findElement(By.xpath("//*[@class='elementor-button-text']"));
        downloadLink.click();

        //Wait for download to complete
        File downloadedFile = new File(downloadFilepath + "/Free_Test_Data_100KB_PDF.pdf");
        WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(30));
        wait.until((ExpectedCondition<Boolean>) wd -> downloadedFile.exists());
        
        // Check if the file exists
        if (downloadedFile.exists()) {
            System.out.println("File is downloaded!");
        } else {
            System.out.println("File is not downloaded.");
        }

        driver.quit();

        // Read the downloaded PDF using PDFBox
        PDDocument document = null;
        try {
            document = Loader.loadPDF(downloadedFile);
            PDFTextStripper pdfStripper = new PDFTextStripper();
            String text = pdfStripper.getText(document);
            document.close();

            // Print the PDF text content
            System.out.println("Text in PDF: ");
            System.out.println(text);
        } catch (IOException e) {
            System.err.println("An error occurred while loading or reading the PDF file: " + e.getMessage());
            e.printStackTrace();
        }

    }

}

The output of the above program is

Summary:

### Summary

1. Setup WebDriver: Configure the browser to handle automatic downloads.
2. Trigger Download: Navigate to the webpage and trigger the download.
3. Wait for Completion: Implement a waiting mechanism to ensure the download completes.
4. Verify Content: Use a library like Apache PDFBox to read the content of the downloaded PDF.

That’s it! Congratulations on making it through this tutorial and hope you found it useful! Happy Learning!!

QA Automation Expert

Automation solutions to build Test Framework

Apache PDFBox

Merge PDF Files in Selenium with Java

1. Add the dependencies

2. WebDriver SetUp and navigate to the page

3. Download the pdfs

4. Merging the PDFs

5. Delete old PDF files

6. Quit the browser

How to Write in PDF with Selenium and Java

1. Add the dependencies

2. Setup ChromeOptions and open the browser

3. Download the PDF Document

4. Create a content stream to write to the PDF

Read PDF Files with Selenium in Java

1. Add the dependencies

2. Download the PDF Document

3. Read the PDF Content