Open source Java projects: SwingLabs PDF Renderer

feature
Jun 12, 200817 mins

View and render PDF content from your Java programs

Adobe’s Portable Document Format (PDF) is a popular choice for exchanging documents over the Internet, but how do you access and present PDF files in your Java applications? Jeff Friesen introduces this month’s Open Source Java Project: a PDF rendering tool that you can easily integrate with your Java applications.

Java-based products such as Bruno Lowagie’s iText make it easy for Java programs to create PDF documents. For example, a servlet could extract product data from a database, and then use iText to dynamically create a PDF-based product catalog based on this data. iText and products like it aren’t designed to render a PDF document’s content, however.

A few years back, Sun Labs researchers developed the all-Java PDF Renderer project because they needed a PDF viewer for content created by OpenOffice. When this project was no longer needed, Sun Labs offered PDF Renderer to Josh Marinacci and Richard Bair (of SwingLabs) who set out to get the project open sourced. Tom Oke signed on to head up future work on the project, and Josh announced the release of the open source PDF Renderer project in December 2007.

In this installment in the Open Source Java Projects series, I’ll introduce you to the SwingLabs PDF Renderer. After you’ve downloaded the project source and run a short demo, we’ll explore the PDF Renderer API. Following application exercises in rendering PDF content to your application screen and sending it to a printer, I’ll wrap up with a more complex example that demonstrates PDF Render’s usefulness.

This project’s license
PDF Renderer adheres to the GNU Lesser General Public License (LGPL). See the Resources section to learn more about this open source license.

Get started with PDF Renderer

The pdf-renderer project, hosted on Java.net, introduces PDF Renderer and provides access to its executable and source code. Point your browser to the site’s “Documents & files” section and download the most recent version of PDF Renderer’s distribution archive. For example, I downloaded PDFRenderer-2008_05_18-bin.zip.

The distribution archive contains a JAR file that stores the API’s class files. The archive also stores the class files to a PDF viewer demo that demonstrates the API’s usefulness for viewing arbitrary PDF documents. You can run this demo at the command line by specifying the following command (assuming that the current directory contains PDFRenderer-2008_05_18.jar):

java -jar PDFRenderer-2008_05_18.jar

Figure 1 shows the demo presenting the first page of a sample PDF document. The left pane of the demo’s split-pane GUI reveals this document’s thumbnail images, which you can click to select the page that appears in the right pane. The right pane presents the current page; you can zoom into this page by selecting the toolbar’s magnifying tool and selecting a rectangular area.

Figure 1. A separate outline window appears if the PDF document supports an outline. (Click to enlarge.)

The distribution archive also contains Javadoc-based documentation, which describes PDF Renderer’s API. This documentation is separately available in its own archive — PDFRenderer-2008_05_18-javadoc.zip, for example. Also in its own archive is the source code to the API and demo. For the previously mentioned distribution archive, PDFRenderer-2008_05_18-src.zip is the equivalent source archive.

PDF Renderer’s API

PDF Renderer presents an API that’s organized into several packages, with com.sun.pdfview serving as the main package. This package’s PDFFile class is the entry point into the API. The following code fragment shows you how to use this class’s solitary public PDFFile(ByteBuffer buf) constructor to create a PDFFile instance initialized to a specific PDF document:

RandomAccessFile raf = new RandomAccessFile (new File ("sample.pdf"), "r");
FileChannel fc = raf.getChannel ();
ByteBuffer buf = fc.map (FileChannel.MapMode.READ_ONLY, 0, fc.size ());
PDFFile pdfFile = new PDFFile (buf);

According to PDFFile‘s Javadoc, the constructor’s java.nio.ByteBuffer argument is derived from a java.io.RandomAccessFile instance (via a java.nio.channels.FileChannel intermediary). The constructor throws java.io.IOException if it cannot find the document’s cross-reference table or trailer dictionary, and com.sun.pdfview.PDFParseException if it has trouble parsing these items.

Obtaining document information

The PDFFile class presents several methods that return information about the PDF document. For example, public String getVersionString() returns the document’s version number as a string, whereas public int getMajorVersion() and public int getMinorVersion() return the version number’s major and minor components as integers.

Version checking
Version methods are useful for determining whether PDF Renderer can parse certain documents — for instance, whether a document was created with a more recent version of Acrobat and contains features that cannot be parsed or rendered by PDF Renderer.

The public boolean isPrintable() and public boolean isSaveable() methods are useful for determining whether the document’s owner has given permission to print or save a copy of the document. It’s a good idea to respect the owner’s wishes in this regard by disabling any printing and/or saving features in your own code should these methods return false.

Another useful method is public OutlineNode getOutline(), which returns the root node of the PDF document’s outline hierarchy as a com.sun.pdfview.OutlineNode instance. Because OutlineNode subclasses javax.swing.tree.DefaultMutableTreeNode, you can invoke inherited methods such as public Enumeration preorderEnumeration() to enumerate this hierarchy.

Listing 1 presents the source code to an application that invokes these methods to obtain the version number, print and save status, and outline hierarchy of a PDF document identified via a command-line argument.

Listing 1. PDFInfo.java

// PDFInfo.java

import java.io.*;

import java.nio.*;
import java.nio.channels.*;

import java.util.*;

import javax.swing.tree.*;

import com.sun.pdfview.*;

public class PDFInfo
{
   public static void main (String [] args) throws IOException
   {
      if (args.length != 1)
      {
          System.err.println ("usage: java PDFInfo pdfspec");
          return;
      }

      RandomAccessFile raf = new RandomAccessFile (new File (args [0]), "r");
      FileChannel fc = raf.getChannel ();
      ByteBuffer buf = fc.map (FileChannel.MapMode.READ_ONLY, 0, fc.size ());
      PDFFile pdfFile = new PDFFile (buf);

      System.out.println ("Major version = "+pdfFile.getMajorVersion ());
      System.out.println ("Minor version = "+pdfFile.getMinorVersion ());
      System.out.println ("Version string = "+pdfFile.getVersionString ()+"n");

      System.out.println ("Is printable = "+pdfFile.isPrintable ());
      System.out.println ("Is saveable = "+pdfFile.isSaveable ()+"n");

      OutlineNode oln = pdfFile.getOutline ();
      if (oln != null)
      {
          System.out.println ("Outlinen");

          Enumeration e = oln.preorderEnumeration ();
          while (e.hasMoreElements ())
          {
             DefaultMutableTreeNode node;
             node = (DefaultMutableTreeNode) e.nextElement ();
             System.out.println (node);
          }
      }
   }
}

Invoke the following command to compile PDFInfo.java:

javac -cp PDFRenderer-2008_05_18.jar PDFInfo.java

Figure 1 revealed some content from a PDF document stored in langspec-3.0.pdf. Assuming a Windows platform, invoke the following command to obtain and output this document’s version number and more:

java -cp PDFRenderer-2008_05_18.jar;. PDFInfo langspec-3.0.pdf

PDFInfo generates the output shown in Listing 2, which I’ve abbreviated. <top> is the textual value assigned to the outline hierarchy’s root node and is technically not part of the hierarchy.

Listing 2. Output of PDFInfo

Major version = 1
Minor version = 3
Version string = 1.3

Is printable = true
Is saveable = true

Outline

<top>
The Java? Language Specification
Preface
Preface to the Second Edition
Preface to the Third Edition
Introduction
1.1 Example Programs
1.2 Notation
1.3 Relationship to Predefined Classes and Interfaces
1.4 References
Grammars
2.1 Context-Free Grammars
2.2 The Lexical Grammar
2.3 The Syntactic Grammar
2.4 Grammar Notation
...
Syntax
18.1 The Grammar of the Java Programming Language
Colophon

Accessing document pages

PDF documents are organized into sequences of pages. You can obtain the number of pages in the current document by invoking PDFFile‘s public int getNumPages() method. Furthermore, you can access these pages by invoking PDFFile‘s public PDFPage getPage(int pagenum) and public PDFPage getPage(int pagenum, boolean wait) methods.

The former getPage() method invokes the latter getPage() method, which creates and returns a com.sun.pdfview.PDFPage instance. This instance contains the document commands for rendering the page identified by pagenum. Pass to pagenum any value from 1 through the value returned by getNumPages().

Behind the scenes, getPage() uses a com.sun.pdfview.PDFParser to parse the commands. Because parsing takes time, pass false to wait to have the parser offload parsing to a background thread — the former getPage() method passes false to wait. Pass true to wait to have parsing take place on the current thread.

Terminate page parsing
If you need to stop a page’s background thread that is in the midst of parsing the page’s document commands, invoke PDFFile‘s public void stop(int pageNum) method with the page’s number specified by pageNum.

PDFParser invokes PDFPage methods such as public void addCommands(PDFPage page), public void addPush(), and public void addImage(PDFImage image) to add commands to the PDFPage instance. Instead of working with these methods, your programs will work with other PDFPage methods, which you’ll see at work in the next few sections.

Render to Images

Once you’ve obtained a PDFPage instance, you can render page content to a java.awt.Image by invoking PDFPage‘s public Image getImage(int width, int height, Rectangle2D clip, ImageObserver observer) and public Image getImage(int width, int height, Rectangle2D clip, ImageObserver observer, boolean drawbg, boolean wait) methods:

  • width and height identify the pixel dimensions of the created Image.
  • clip identifies that portion of the page rendered to the Image. Each of its field values is specified in terms of user-space units, as the pdf specification explains.
  • observer identifies an image observer that’s notified as the image changes. Pass null to observer if you don’t need to be notified.
  • drawbg specifies whether a white background (true) or no background (false) should be rendered behind the image. The former getImage() method, which invokes the latter getImage() method, passes true to drawbg.
  • wait specifies whether this method should not return until the image is fully rendered (true) or return immediately (false). The former getImage() method passes false to wait.

Before invoking getImage(), you’ll need to obtain a suitable java.awt.geom.Rectangle2D value that you can pass to clip. For example, you can obtain a bounding box for the entire page (allowing you to render the entire page to an Image) by invoking PDFPage‘s public Rectangle2D getBBox() method.

Application example: PDFViewer

User space
The coordinates of the various graphics objects that contribute to a page in a PDF document are specified in a device-independent coordinate system known as user space. This is analogous to working with the various methods in the java.awt.Graphics2D class, where coordinates are also specified in user space. Prior to version 1.6 of the PDF specification, each unit along a page’s x and y axes had a length of 1/72 inch. Although version 1.6 and later versions let you specify a different default length by defining a UserUnit entry in a document’s page dictionary, I assume that documents adhere to a 1/72 inch unit length in this article (for simplicity).

Listing 3 presents the source code to an application that uses getBBox() with the second getImage() method to render an arbitrary page from an arbitrary document to an Image, which is subsequently displayed on the screen.

Listing 3. PDFViewer.java

// PDFViewer.java

import java.awt.*;
import java.awt.geom.*;

import java.io.*;

import java.nio.*;
import java.nio.channels.*;

import javax.swing.*;

import com.sun.pdfview.*;

public class PDFViewer extends JFrame
{
   static Image image;

   public PDFViewer (String title)
   {
      super (title);
      setDefaultCloseOperation (EXIT_ON_CLOSE);

      JLabel label = new JLabel (new ImageIcon (image));
      label.setVerticalAlignment (JLabel.TOP);

      setContentPane (new JScrollPane (label));

      pack ();
      setVisible (true);
   }

   public static void main (final String [] args) throws IOException
   {
      if (args.length < 1 || args.length > 2)
      {
          System.err.println ("usage: java PDFViewer pdfspec [pagenum]");
          return;
      }

      int pagenum = (args.length == 1) ? 1 : Integer.parseInt (args [1]);
      if (pagenum < 1)
          pagenum = 1;

      RandomAccessFile raf = new RandomAccessFile (new File (args [0]), "r");
      FileChannel fc = raf.getChannel ();
      ByteBuffer buf = fc.map (FileChannel.MapMode.READ_ONLY, 0, fc.size ());
      PDFFile pdfFile = new PDFFile (buf);

      int numpages = pdfFile.getNumPages ();
      System.out.println ("Number of pages = "+numpages);
      if (pagenum > numpages)
          pagenum = numpages;

      PDFPage page = pdfFile.getPage (pagenum);
              
      Rectangle2D r2d = page.getBBox ();

      double width = r2d.getWidth ();
      double height = r2d.getHeight ();
      width /= 72.0;
      height /= 72.0;
      int res = Toolkit.getDefaultToolkit ().getScreenResolution ();
      width *= res;
      height *= res;

      image = page.getImage ((int) width, (int) height, r2d, null, true, true);

      Runnable r = new Runnable ()
                   {
                       public void run ()
                       {
                          new PDFViewer ("PDF Viewer: "+args [0]);
                       }
                   };
      EventQueue.invokeLater (r);
   }
}

PDFViewer takes one or two command-line arguments. The first argument is the name of a PDF document, and the optional second argument is the number of the page to render — the page number defaults to 1 if the second argument isn’t present. If you specify a page number less than 1, the number is clamped to 1. Similarly, if this number exceeds the number of document pages, this number is clamped to the page count.

After retrieving the appropriate PDFPage object via getPage(), PDFViewer retrieves the page’s bounding box via getBBox(). It converts the box’s width and height from user-space coordinates to equivalent screen device-space coordinates. This width and height, along with the bounding box are passed to getImage() to render the entire page.

Invoke the following command to compile PDFViewer.java:

javac -cp PDFRenderer-2008_05_18.jar PDFViewer.java

Assuming a Windows platform, invoke the following command to try out PDFViewer with Adobe’s pdf_reference.pdf document — this document contains the pdf specification:

java -cp PDFRenderer-2008_05_18.jar;. PDFViewer pdf_reference.pdf

This command line tells PDFViewer to render this document’s first page (by default) — specify a page number after pdf_reference.pdf to render another page. Figure 2 reveals the rendering result.

Figure 2. PDF Viewer renders a page at the same 100% zoom level as rendered by Adobe Acrobat. (Click to enlarge.)

Render to arbitrary graphics contexts

PDF Renderer makes it possible to render page content to an arbitrary graphics context (perhaps to a printer graphics context). Accomplish this task with the help of the com.sun.pdfview.PDFRenderer class and its public PDFRenderer(PDFPage page, Graphics2D g, Rectangle imgbounds, Rectangle2D clip, Color bgColor) constructor:

  • page identifies the PDF content to render
  • g identifies the graphics context onto which the PDF content will be rendered
  • imgbounds identifies the pixel dimensions of the rendered content
  • clip identifies that portion (in user space) of the page to render — pass null if you want to render the entire page
  • bgColor identifies the color of the rendered content’s background — pass null if you don’t want to render a background

After creating a PDFRenderer instance, invoke its inherited public void run() method to perform the rendering operation. If the PDFPage is being obtained on a background thread, you’ll need to invoke PDFPage‘s public void waitForFinish() method before invoking run(), to ensure that the page is completely loaded prior to rendering.

Application example: PDFPrinter

Listing 4 presents the source code to an application that uses PDFRenderer to render a user-specified range of a PDF document’s pages to the default printer.

Listing 4. PDFPrinter.java (click to view)

PDFPrinter responds to the user selecting the Print menu item by invoking its doPrint() method. This method begins by prompting the user to select a PDF document via a file chooser. Assuming that the user doesn’t cancel the chooser, doPrint() uses the selected file as the basis for a new PDFFile instance.

Moving on, doPrint() creates a new printer job, specifying the PDFPrinter object as the job’s printable. This method then initializes the “range of pages” print attribute to the number of pages in the PDF document. Assuming that the user doesn’t cancel the print dialog box that is subsequently displayed, doPrint() initiates the printing task.

The most interesting code resides in the printable’s print() method, which is called for each page that is to be printed. Following a quick conversion of the zero-based index argument to a one-based page number, this method saves the printer context’s transformation matrix because PDFRenderer‘s constructor modifies this matrix.

After retrieving a PDFPage object corresponding to the page number, print() invokes this object’s public Dimension getUnstretchedSize(int width, int height, Rectangle2D clip) method to obtain the best possible dimensions for rendering the page, at the proper aspect ratio, within the printer context’s imageable area.

These dimensions are used to construct a boundary rectangle (within the imageable area) in which the entire page is rendered. This rectangle is passed as one of the arguments to PDFRenderer‘s constructor, along with the PDFPage, printer graphics context, and a pair of null arguments to specify that the entire page be rendered and to not render a background color.

Because the PDFParser that works with PDFPage to parse page commands may still be running on its own thread, print() invokes waitForFinish(). This method returns after parsing finishes, making it possible to start the rendering-to-printer-context task via the call to run(). After restoring the saved transformation matrix, print() renders a border around the imageable area.

Invoke the following command to compile PDFPrinter.java:

javac -cp PDFRenderer-2008_05_18.jar PDFPrinter.java

Assuming a Windows platform, invoke the following command to launch PDFPrinter — unlike PDFViewer, PDFPrinter takes no command-line arguments:

java -cp PDFRenderer-2008_05_18.jar;. PDFPrinter

PDF Renderer and a Java disassembler

The pdf-renderer project site lists several uses for PDF Renderer, one of them being the ability to view PDFs in the context of an application. For example, as an alternative to working with JavaHelp, you could store your application’s help content in a PDF document, and then use PDF Renderer with a custom dialog box to render and present this content to the user.

I’ve created a Java disassembler application (essentially a GUI front-end for the JDK’s javap tool, which does the actual work of disassembling a class file) to demonstrate PDF Renderer’s usefulness in a help context. After you’ve selected a class file via the application’s menu, the application reveals the resulting disassembly in its scrollable content area — see Figure 3.

Figure 3. Along with the mouse, you can use the up and down arrow keys to scroll through the disassembly. (Click to enlarge.)

Move the mouse pointer and click the left mouse button. The application responds by identifying the text below the pointer. If this text refers to an instruction mnemonic, the application creates and shows a help dialog box that uses PDF Renderer to render mnemonic-specific pages from a PDF document. Figure 4 shows this dialog box presenting the first page of the invokevirtual instruction’s help content.

Figure 4. Click the buttons to transition from page to page in multiple-page help content. (Click to enlarge.)

The application obtains its help pages from a jvmins.pdf file, whose content originates from a jvmins.doc Microsoft Word document — after modifying the Word-based content, you can use a tool such as Virtual PDF Printer to save this content to a PDF file. These files and the application’s JD.java source code (see Listing 5) are located in this article’s code archive.

Listing 5. JD.java (click to view)

PDF Renderer does its work in the HelpViewer class’s getImages() method. It reads all of the pages associated with the selected instruction mnemonic, and renders their content to an array of Images after removing half an inch of empty border space (for aesthetic reasons) — jvmins.pdf has a default unit length of 1/72 inch.

The getImages() method employs PDFPage‘s getUnstretchedSize() method to size the page so that an Image doesn’t exceed 700 pixels horizontally by 700 pixels vertically while maintaining the correct aspect ratio. The result should be readable and vertically scrollable (without also being horizontally scrollable) within the confines of a 600-by-600-pixel dialog box.

Invoke the following command to compile JD.java:

javac -cp PDFRenderer-2008_05_18.jar JD.java

Assuming a Windows platform, invoke the command below to start the GUI (you might want to modify the source code to support passing the name of a class file as a command-line argument):

java -cp PDFRenderer-2008_05_18.jar;. JD

In conclusion

PDF Renderer fills an important niche in the Java developer’s open source toolbox, making it much easier to access and render PDF files using Java libraries. This tool was developed for in-house use at Sun Labs and has only recently been announced as an open source project under SwingLabs. Its youthful weaknesses include not being able to render every possible PDF document; rendering text in fonts that aren’t identical to the fonts used when the document was created; and a Javadoc that is sparse in explanatory information. Given the relevance of the project and the proven track record of its contributors, however, I’m confident that PDF Renderer will continue to improve as it matures.

See the Resources section to learn more about PDF Renderer and SwingLabs.

Jeff Friesen is a freelance software developer and educator who specializes in Java technology. Check out his javajeff.mb.ca website to discover all of his published Java articles and more.