by Bruno Lowagie

Write your own Twitter application

how-to
Mar 10, 200927 mins

Archive your tweets with Apache Commons HttpClient, dom4j, and iText

That buzz you’ve been hearing is the sound of millions of Twitter updates — tweets — careening around cyberspace. Even cooler than the Twitter social-networking service itself is that fact that Twitter data is exposed in an open API that your applications can tap into. With iText creator Bruno Lowagie as your guide, find out how to leverage three open source Java libraries to archive tweets dynamically in a PDF document. You’ll write standalone Java code, then integrate it into a servlet so that you can offer the application as a service to other Twitter users.

The Twitter social-networking and micro-blogging service has become immensely popular since it launched in 2006. Twitter users send tweets — real-time, text-based updates of up to 140 characters — and read other users’ tweets via the Twitter Website, Short Message Service (SMS), Really Simple Syndication (RSS), or Twitter applications. Tweets are displayed on the user’s profile page and delivered to other users who have signed up to receive them. In this article you’ll learn how to build your own Twitter service: an application that accesses tweets via the Twitter API and archives them in the form of a PDF file.

You’ll build your application with the help of three open source Java libraries:

  • HttpClient 3.x from the Apache Commons library. You’ll use this API to obtain an XML stream of tweets from the Twitter API, as well as the Commons Logging and Codec components.
  • dom4j to parse the XML and extract specific data from each tweet.
  • iText to create the PDF document.

This is a hands-on tutorial, so download those libraries if you haven’t already done so. I’ll remind you which JARs you need from them as we go along. If you don’t already have a Twitter account, set one up now and start using the service so that you have some tweets to work with.

You’ll begin by writing your Twitter application as a standalone Java program. Eventually you’ll integrate the code in a servlet, so that you can offer the application on your site as a service available to other Twitterati.

Getting started

Every Twitter application makes use of the Twitter API, which is well documented on the Twitter API wiki. Let’s suppose you’re just interested in read-only access for now: you want to retrieve tweets and visualize them in some way. You don’t know yet that you’re going to produce a PDF document; you only know that you’re going to “consume” a number of tweets. That’s why you start by writing an interface for such a consumer, as shown in Listing 1.

Listing 1. TweetConsumer.java

import org.dom4j.Element;
public interface TweetConsumer {
    public String tweet(Element element) throws TweetException;
}

The Twitter API can provide tweets in XML, JSON (JavaScript Object Notation), and RSS. For this example you choose to work with XML, and you’ll use dom4j to parse the XML. (Don’t forget to add the dom4j JAR to your classpath.) You’re importing the org.dom4j.Element interface. This XML element will contain plenty of information: a date, some text, a user ID, and also a tweet ID. The tweet ID is the String you’ll use as the return value for the tweet() method.

You throw a typed exception when something goes wrong. That’s the TweetException class shown in Listing 2.

Listing 2. TweetException.java

public class TweetException extends Exception {
    private static final long serialVersionUID = 7577136074623618615L;
    public TweetException(Exception e) {
        super(e);
    }
}

This is all standard stuff, but now comes the interesting part: the TweetProducer class.

Connecting to the Twitter API

Make sure the commons-httpclient.jar, commons-codec.jar, and commons-logging.jar from the Apache Commons project are in your classpath. These will help you establish a connection with Twitter. Listing 3 shows the TweetProducer class, which you’ll use to retrieve a user’s tweets.

Listing 3. TweetProducer.java

public class TweetProducer {
    protected HttpClient client;
    protected TweetConsumer consumer;

    public void createClient(String username, String password) {
        client = new HttpClient();
        client.getState().setCredentials(
            new AuthScope("twitter.com", 80, AuthScope.ANY_REALM),
            new UsernamePasswordCredentials(username, password));
        client.getParams().setAuthenticationPreemptive(true);
    }

    public void setConsumer(TweetConsumer consumer) {
        this.consumer = consumer;
    }

    public String execute(String since_id) throws TweetException {
        String id = null;
        String tmp;
        GetMethod get = new GetMethod("http://twitter.com/statuses/friends_timeline.xml");
        if (since_id != null && since_id.length() > 0) {
            get.setQueryString("?count=200&since_id=" + since_id);
        }
        else {
            get.setQueryString("?count=200");
        }
        get.setDoAuthentication(true);
        try {
            client.executeMethod(get);
            SAXReader reader = new SAXReader();
            Document document = reader.read(get.getResponseBodyAsStream());
            Element root = document.getRootElement();
            for (Iterator i = root.elementIterator(); i.hasNext(); ) {
                tmp = consumer.tweet((Element)i.next());
                if (id == null) id = tmp;
            }
        } catch (HttpException e) {
            throw new TweetException(e);
        } catch (IOException e) {
            throw new TweetException(e);
        } catch (DocumentException e) {
            throw new TweetException(e);
        } finally {
            get.releaseConnection();
        }
        return id;
    }
}

TweetProducer has three methods:.

  • In createClient(), you create an HttpClient object. You need to be logged in to get the tweets of the people you are following. So you must provide your username and password. Twitter uses HTTP Basic Authentication. You attempt preemptive authentication right after creating the HttpClient object.
  • With the setConsumer() method, you pass a TweetConsumer to the TweetProducer. You’re using an interface because at this moment, you might not have decided yet in which form you’ll present the tweets.
  • The execute() method will do the actual work. This method takes an id as parameter and, when finished, returns another id. Now let’s take a closer look at what happens in this method.

Retrieving tweets

The standard way to retrieve data (using read-only access) through the Twitter API is via the GET method. You’re interested in getting the statuses for a specific user (status is another word for tweet), and you want to receive them in the form of an XML file. To achieve this, you need this GetMethod object:

GetMethod get = new GetMethod("http://twitter.com/statuses/friends_timeline.xml");

If you execute this method, you’ll receive an XML document containing 20 of the most recent statuses for the authenticated user. You can get 20 more if you use the page parameter. You can go to up to 160 pages; that’s 3,200 statuses if you hit the server 160 times with an incrementing page number. For this demo application it’s sufficient to use the count parameter, which lets you specify how many tweets you want to receive. The default is 20; the maximum value allowed by the Twitter API is 200. There’s another reason why you should prefer using count in this context: as documented on the API wiki, you need be gentle on the server; don’t abuse Twitter by hitting the server too much without a valid reason!

Another optional parameter is since_id. Every tweet or status has an ID, and if you use the since_id, only statuses with an ID greater than (that is, more recent than) the specified ID will be returned. If you provide such an ID, your TweetProducer sets the query string like this:

get.setQueryString("?count=200&since_id=" + since_id);

Once the query string is set, you can execute the query:

client.executeMethod(get);

Now you can feed the response to a SAXReader by getting the root element:

SAXReader reader = new SAXReader();
Document document = reader.read(get.getResponseBodyAsStream());
Element root = document.getRootElement();

Finally, you iterate over every element inside the <statuses> tags:

for (Iterator i = root.elementIterator(); i.hasNext(); ) {
    tmp = consumer.tweet((Element)i.next());
    if (id == null) id = tmp;
}

Observe that the first tweet you receive is the most recent one. You store this tweet’s ID in the id variable. That’s the String returned by the execute() method.

So far, so good, but where’s the main method? How can you test this code? Maybe you should write something simple before you begin producing a PDF …

My first TweetConsumer

You have written a TweetProducer class, but you haven’t been able to test it yet, because you didn’t implement the TweetConsumer interface yet. Listing 4 creates a simple standalone application that sends the XML to System.out. You can use it to find out what’s inside the <status> tag of the XML that the Twitter API provides.

Listing 4. SystemOutTweets

import org.dom4j.Element;

public class SystemOutTweets implements TweetConsumer {
    public String tweet(Element tweet) {
        System.out.println(tweet.asXML());
        return tweet.elementText("id");
    }

    public static void main(String[] args) throws TweetException {
        TweetProducer t = new TweetProducer();
        t.createClient("bruno1970", "myfakepassword");
        t.setConsumer(new SystemOutTweets());
        System.out.println("Start tweeting:");
        String id = t.execute(null);
        System.out.println("The End; last tweet: " + id);
    }
}

If you replace bruno1970 (my user ID) with yours and myfakepassword with your real password, you’ll receive your 200 most recent <status> elements. Listing 5 is a nice example of rrradiogirrrl saying “Hello! Good morning!” to her followers:

Listing 5. XML snippet containing one status

<status>
  <created_at>Thu Jan 29 07:01:55 +0000 2009</created_at>
  <id>1157836999</id>
  <text>Hello! Good morning!</text>

  <source><a href="https://www.ping.fm/">Ping.fm</a></source>
  <truncated>false</truncated>
  <in_reply_to_status_id/>
  <in_reply_to_user_id/>

  <favorited>false</favorited>
  <in_reply_to_screen_name/>
  <user>
    <id>9737242</id>
    <name>Sigrid</name>

    <screen_name>rrradiogirrrl</screen_name>
    <location>Belgium</location>
    <description>happy,internet addict,radio geeky, web 2.0 fan, reader, jr. product manager @ a Belgian television channel</description>
    <profile_image_url>http://s3.amazonaws.com/twitter_production/profile_images/58441671/nospam_gmail.com_cbf8aad8_normal.jpg</profile_image_url>

    <url>http://www.sigridschrijft.be/blog</url>
    <protected>false</protected>
    <followers_count>465</followers_count>
  </user>

</status>

In my case the output written to System.out ends with:

The End; last tweet: 1158488197

I can now replace null as a parameter for the execute method by this id:

t.execute("1158488197");

Instead of the initial 200 <status> elements, I now get only 9 XML snippets. That means that the Twitterati I am following have sent 9 tweets to Twitter since I last ran SystemOutTweets. Twitter is happy because I use less bandwidth. I’m happy because I don’t need to read the same tweets twice.

Now that you know the TweetProducer works, it’s time to think about a TweetConsumer that produces PDF.

A TweetConsumer that produces PDFs

To create a PDF with iText, you follow five steps, as shown in the “Hello World” example in Listing 6. (Don’t forget to put the iText JAR in your classpath.)

Listing 6. HelloWorld.java

import java.io.FileOutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;

public class HelloWorld {
    public static void main(String args[]) {
        try {
            // step 1: create a Document object
            Document document = new Document();
            // step 2: connect the Document with an OutputStream using a PdfWriter
            PdfWriter.getInstance(document, new FileOutputStream("hello.pdf"));
            // step 3: open the document

            document.open();
            // step 4: add content
            document.add(new Paragraph("Hello World"));
            // step 5: close the document
            document.close();
        } catch (Exception e) {
            System.out.println(e);
        }
    }
}

More examples

You can find a plethora of other “Hello World” examples in Chapter 2 of my book iText in Action (Manning Publications, 2007); if you want a sample, you can download this chapter for free.

Steps 1 to 3

For your first PdfTwitter application, you need more than just a Paragraph containing the words Hello World. Let’s split up the PdfTwitter class into different parts and define a Document object as a member variable, along with a counter (to count the tweets). Combine the first three steps of the PDF-generation process in the constructor, as is done in Listing 7:

Listing 7. Member variables and constructor of PdfTwitter.java

protected int counter = 0;
protected Document document;

public PdfTwitter(String file) throws DocumentException, IOException {
    document = new Document(PageSize.A4, 72, 36, 36, 36);
    PdfWriter.getInstance(document, new FileOutputStream(file));
    document.open();
}

As you can see, you are passing a path to the resulting PDF file as a parameter. That’s where you’ll find the PDF with your tweets once you’ve executed the standalone application.

Step 4

Your PdfTwitter class implements TweetConsumer, so it must implement the tweet() method. This is an ideal method for implementing Step 4 in the PDF-creation process.

Listing 8. Implementing the tweet() method of PdfTwitter.java

public String tweet(org.dom4j.Element tweet) throws TweetException {
    try {
        org.dom4j.Element user = tweet.element("user");
        // we create the table
        PdfPTable table = new PdfPTable(new float[]{ 1, 5, 6 });
        table.setWidthPercentage(100);
        table.setSpacingBefore(8);
        // first row
        table.getDefaultCell().setPadding(5);
        // first column = empty
        table.addCell("");
        // second column = screen name
        table.addCell(user.elementText("screen_name"));
        // third column = date
        table.getDefaultCell().setHorizontalAlignment(Element.ALIGN_RIGHT);
        table.addCell(tweet.elementText("created_at"));
        // second row
        table.getDefaultCell().setHorizontalAlignment(Element.ALIGN_LEFT);
        table.getDefaultCell().setRotation(90);
        table.getDefaultCell().setFixedHeight(34);
        // first column = counter
        table.addCell(String.valueOf(++counter));
        table.getDefaultCell().setRotation(0);
        // second and third column = tweet text
        table.getDefaultCell().setColspan(2);
        Phrase p = new Phrase();
        p.add(new Phrase(user.elementText("name")));
        p.add(new Phrase(" tweets: "));
        p.add(new Phrase(tweet.elementText("text")));
        table.addCell(p);
        // we add the table
        document.add(table);
        // we return the tweet id
        return tweet.elementText("id");
    } catch (DocumentException e) {
        throw new TweetException(e);
    }
}

Listing 8 probably needs some more explanation. Let’s start with the Twitter-related stuff. Suppose you want to show the tweeted text and the date; this information can be accessed like this:

tweet.elementText("created_at");

tweet.elementText("text");

You’ll also want to show some information about the user, more specifically his or her name and screen name (or nickname):

org.dom4j.Element user = tweet.element("user");
user.elementText("name");
user.elementText("screen_name");

This is dom4j at work for you. Now let’s use iText. It would be nice to organize this information in a tabular structure where every tweet or status is shown in a different table. Your best choice is to use PdfPTable to achieve this:

PdfPTable table = new PdfPTable(new float[]{ 1, 5, 6 });
table.setWidthPercentage(100);
table.setSpacingBefore(8);
table.getDefaultCell().setPadding(5);

You create a table with three columns. Observe that the values 1, 5, and 6 passed as an array of floats aren’t the absolute widths of these columns. They are relative values, indicating that the third column will take half of the available width (because 6 is half of 1 + 5 + 6). Column 1 will fit five times in column 2. By setting the width percentage to 100, you tell the table it should take the complete available width on the page. In practice, this means there will be a left margin of 1 inch and a right margin of half an inch, because those were the margins you used in Listing 7 when you created your Document object:

document = new Document(PageSize.A4, 72, 36, 36, 36);

Note that PDF measurements are expressed in user units. A user unit corresponds to a point (pt); there are 72 points in one inch. When you add your table to the document, it will start with a vertical offset of 8pt because you defined it like this in Listing 8 using the setSpacingBefore method. This way, the tables aren’t glued to one another if you add several tables in a row. You can now create PdfPCell objects, and add them to the PdfPTable object. The advantage of using PdfPCell is that you can tune the properties and parameters of every single cell. This isn’t always necessary. You can also set the properties of the cell, using table.getDefaultCell(). For instance, you told the table it should use a padding of 5pt with the method setPadding. As a result, this padding will be used for every cell that is added:

table.addCell(user.elementText("screen_name"));
Phrase p = new Phrase();
p.add(new Phrase(user.elementText("name")));
p.add(new Phrase(" tweets: "));
p.add(new Phrase(tweet.elementText("text")));
table.addCell(p);

In other words, when you use addCell without a full-blown PdfPCell object, but rather with a String or Phrase, the properties of the “default cell” are used. While adding cells, you can change these properties as much as you like. In Listing 8, you change the horizontal alignment, the rotation, and the colspan, and you even set a fixed height for a cell. After adding five cells, of which one has a colspan equal to two, you have a table with two rows (the total number of cells divided by the number of columns). You can now add the table to the document:

document.add(table);

That was Step 4, and it’s iterated by the TweetProducer as many times as necessary, with a maximum of 200 times because the Twitter API limits the count parameter to 200.

Step 5

You mustn’t forget about Step 5:

public void close() {
    document.close();
}

This step finalizes the PDF file. Your example is almost complete. There’s only one thing missing in your PdfTwitter class if you want to find out what the end result looks like: you need to write a main() method.

The main() method

You have gone through the five steps in the PDF-creation process. Now it’s time to produce the actual PDF document. Listing 9 shows PdfTwitter‘s main() method.

Listing 9. PdfTwitter.java’s main() method

public static void main(String[] args)
    throws DocumentException, org.dom4j.DocumentException,
        FileNotFoundException, IOException, TweetException {
    PdfTwitter pdf = new PdfTwitter("tweets.pdf");
    TweetProducer t = new TweetProducer();
    t.createClient("bruno1970", "myfakepassword");
    t.setConsumer(pdf);
    t.execute(null);
    pdf.close();
}

If you execute this standalone example, a PDF file named tweets.pdf is generated in your working directory. Figure 1 shows a screen shot of the resulting PDF.

First attempt to create a PDF containing our tweets
Figure 1. First attempt to create a PDF containing tweets (Click to enlarge.)

Are you happy? Hmm … not really. Why can’t that PDF look more like Figure 2?

Second attempt to create a PDF containing our tweets
Figure 2. Second attempt to create a PDF containing tweets (Click to enlarge.)

Do you see the URL tw.1t3xt.com on the side of Figure 2? Do you see the Twitter-style rectangles with the rounded corners? How can you achieve this? The answer is simple: by using page events and table events.

Pimp your PDF!

You won’t need to change much to the PdfTwitter class you’ve already written. Before looking at what has to change, consider another way to add content to a page using iText. Time for the second “Hello World” example, in Listing 10.

Listing 10. HelloWorldAbsolute.java

import java.io.FileOutputStream;
import com.lowagie.text.Document;
import com.lowagie.text.Element;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfWriter;

public class HelloWorldAbsolute {
    public static void main(String[] args) {
        try {
            // step 1: create a Document object
            Document document = new Document();
            // step 2: connect the Document with an OutputStream using a PdfWriter
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("world.pdf"));
            // step 3: open the document
            document.open();
            // step 4: add content
            PdfContentByte cb = writer.getDirectContent();
            BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
                    BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
            cb.beginText();
            cb.setFontAndSize(bf, 12);
            cb.showTextAligned(Element.ALIGN_LEFT, "Hello World", 36, 788, 0);
            cb.endText();
            // step 5: close the document
            document.close();
        } catch (Exception e) {
            System.out.println(e);
        }
    }
}

The difference from Listing 6’s “Hello World” example is that you no longer depend on iText to decide where to put a paragraph, or where to put a table. You add text at an absolute coordinate (in this case: [x=36; y=788]) and define an angle (in this example: 0 degrees). The showTextAligned() method is the ideal way to put the URL of your online Twitter application in the margin of each page. This is what you do in the TwitterPage class.

Page events

Listing 11 shows the TwitterPage class.

Listing 11. TwitterPage.java

import java.io.IOException;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Element;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfPageEventHelper;
import com.lowagie.text.pdf.PdfWriter;

public class TwitterPage extends PdfPageEventHelper {

    protected BaseFont bf;

    public TwitterPage() throws DocumentException, IOException {
        bf = BaseFont.createFont();
    }

    public void onEndPage(PdfWriter writer, Document document) {
        PdfContentByte cb = writer.getDirectContent();
        cb.saveState();
        cb.beginText();
        cb.setRGBColorStroke(0x9a, 0xe4, 0xe8);
        cb.setTextRenderingMode(PdfContentByte.TEXT_RENDER_MODE_STROKE);
        cb.setFontAndSize(bf, 36);
        cb.showTextAligned(Element.ALIGN_RIGHT, "tw.1t3xt.com", 68, 806, 90);
        cb.endText();
        cb.restoreState();
    }

}

TwitterPage extends PdfPageEventHelper. PdfPageEvent is an interface with a plethora of methods, but because you use PdfPageEventHelper (a class that has empty implementations of every method), you don’t need to provide these implementations in the TwitterPage class. This is convenient, because you’re only interested in the onEndPage for this application.

A BaseFont object is created upon creation of the PdfPageEvent implementation. You’re overriding the empty onEndPage method, so that it adds tw.1t3xt.com to each page at the coordinate [x=68; y=806], right aligned and rotated 90 degrees. Observe that you also changed the render mode. Normally letters are filled; in this case only the outlines of the glyphs will be drawn, more specifically in the color that is so typical for twitter: #9ae4e8.

To make this work, you must declare the event after you’ve created a PdfWriter instance:

writer.setPageEvent(new TwitterPage());

Now the onEndPage() method will be called every time iText is done with a page.

Table events

Table events are similar to page events. Listing 12 shows a possible implementation of the PdfPTableEvent interface.

Listing 12. TableBackground.java

import java.io.IOException;

import com.lowagie.text.DocumentException;
import com.lowagie.text.Element;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPTableEvent;

public class TableBackground implements PdfPTableEvent {

    protected BaseFont bf;

    public TableBackground() throws DocumentException, IOException {
        bf = BaseFont.createFont(BaseFont.TIMES_BOLDITALIC, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED);
    }

    public void tableLayout(PdfPTable table, float[][] width, float[] height,
            int headerRows, int rowStart, PdfContentByte[] canvas) {
        PdfContentByte cb = canvas[PdfPTable.BACKGROUNDCANVAS];
        cb.saveState();
        cb.setRGBColorFill(0x9a, 0xe4, 0xe8);
        cb.roundRectangle(width[0][0], height[2], width[0][3] - width[0][0], height[0] - height[2], 4);
        cb.roundRectangle(width[0][1], height[2] + 3, width[0][3] - width[0][1] - 3, height[1] - height[2] - 6, 4);
        cb.eoFill();
        cb.beginText();
        cb.setRGBColorStroke(0x9a, 0xe4, 0xe8);
        cb.setFontAndSize(bf, 36);
        cb.setTextRenderingMode(PdfContentByte.TEXT_RENDER_MODE_STROKE);
        cb.showTextAligned(Element.ALIGN_LEFT, """, width[0][1] - 4, height[1] - 29, 0);
        cb.endText();
        cb.restoreState();
    }

}

When implementing page events, you received a PdfWriter instance from which you could obtain a canvas using the getDirectContent() method. With table events, you receive an array with PdfContentByte objects. Because you want to draw the background for each table, you need the BACKGROUNDCANVAS. Along with an array of canvases, you get parameters that contain the coordinates of every cell. These are provided in the form of arrays of floats. You can use these coordinates to draw two rectangles with rounded corners. If you fill them, using the even/odd rule, the second rounded rectangle acts as a hole in the first one. As a final touch, you could also add a double-quote character inside the second rectangle. Don’t forget to declare this table event to each table:

table.setTableEvent(new TableBackground());

Now that you’ve got the code to draw text and background, you can rewrite PdfTwitter to produce a PDF that looks like the one shown in Figure 2.

Let’s rename PdfTwitter to TwitterPdf and add some changes. The most important changes are indicated in bold in Listing 13.

Listing 13. TwitterPdf.java

import java.awt.Color;
import java.io.ByteArrayOutputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Element;
import com.lowagie.text.Font;
import com.lowagie.text.PageSize;
import com.lowagie.text.Phrase;
import com.lowagie.text.Rectangle;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfWriter;

public class TwitterPdf implements TweetConsumer {

    protected int counter = 0;
    protected Document document;
    <b>protected ByteArrayOutputStream baos;

    protected Font small_white = new Font(Font.HELVETICA, 8, Font.BOLD);
    protected Font normal_white = new Font(Font.HELVETICA, 11, Font.BOLD, Color.WHITE);
    protected Font normal = new Font(Font.HELVETICA, 11);
    protected Font bold = new Font(Font.HELVETICA, 11, Font.BOLD);</b>

    public TwitterPdf() throws DocumentException, IOException {
        document = new Document(PageSize.A4, 72, 36, 36, 36);
        <b>baos = new ByteArrayOutputStream();</b>
        PdfWriter writer = PdfWriter.getInstance(document, <b>baos</b>);
        <b>writer.setPageEvent(new TwitterPage());</b>
        document.open();
    }

    public void close() {
        document.close();
    }

    <b>public int size() {
        return baos.size();
    }

    public void writeTo(OutputStream os) throws IOException {
        if (document.isOpen()) return;
        baos.writeTo(os);
        os.close();
    }</b>

    public String tweet(org.dom4j.Element tweet) throws TweetException {
        try {
            org.dom4j.Element user = tweet.element("user");
            PdfPTable table = new PdfPTable(new float[]{ 1, 10, 12 });
            <b>table.setTableEvent(new TableBackground());</b>

            table.setWidthPercentage(100);
            table.setSpacingBefore(8);
            table.getDefaultCell().setPadding(5);
            <b>table.getDefaultCell().setBorder(Rectangle.NO_BORDER);</b>
            table.addCell("");
            table.addCell(new Phrase(user.elementText("screen_name"), <b>small_white</b>));
            table.getDefaultCell().setHorizontalAlignment(Element.ALIGN_RIGHT);
            table.addCell(new Phrase(tweet.elementText("created_at"), <b>small_white</b>));
            table.getDefaultCell().setHorizontalAlignment(Element.ALIGN_LEFT);
            table.getDefaultCell().setRotation(90);
            table.getDefaultCell().setFixedHeight(38);
            table.addCell(new Phrase(String.valueOf(++counter), normal_white));
            table.getDefaultCell().setRotation(0);
            table.getDefaultCell().setColspan(2);
            Phrase p = new Phrase();
            p.add(new Phrase(user.elementText("name"), <b>bold</b>));
            p.add(new Phrase(" tweets: ", <b>bold</b>));
            p.add(new Phrase(tweet.elementText("text")));
            table.addCell(p);
            document.add(table);
            return tweet.elementText("id");
        } catch (DocumentException e) {
            throw new TweetException(e);
        } catch (IOException e) {
            throw new TweetException(e);
        }
    }

}

What has changed? For starters, you no longer write the PDF to a file; you’re using a ByteArrayOutputStream instead of to a FileOutputStream. In other words, the PDF is kept in memory instead of on disk. You also add methods to get the number of bytes and to write the bytes to an OutputStream of your choice. This doesn’t necessarily have to be a FileOutputStream. Observe that you also defined different fonts with different colors. You’re going to use these fonts when creating the different phrases. Finally, you can see the lines where you declare the events — first the page event to the writer, then the table event to the table.

You could take the main() method from the PdfTwitter example and adapt it for this TwitterPdf variant, but … wouldn’t it be really cool to use all of this in a servlet and provide the functionality as an online service?

Creating a PDF from a servlet

If you go to https://tw.1t3xt.com/, you’ll find some PHP pages that allow you to register. When you do, you automatically receive an email message containing an ID. I’m not going to elaborate on this PHP code. It’s sufficient to know that a custom id is generated. This id is used in a form with a POST method that submits your data to a servlet named TwiText. The source code of this servlet is in Listing 14.

Listing 14. TwiText.java

import java.io.IOException;
import java.net.URLEncoder;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;

import javax.servlet.Servlet;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class TwiText extends HttpServlet implements Servlet {

    private static final long serialVersionUID = 6820723126200418459L;

    private String db_database;
    private String db_host;
    private String db_username;
    private String db_password;

    public void init(ServletConfig config) throws ServletException {
        super.init(config);
        db_database = config.getInitParameter("db_name");
        db_host = config.getInitParameter("db_host");
        db_username = config.getInitParameter("db_username");
        db_password = config.getInitParameter("db_password");
    }

    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        String id = request.getParameter("id");
        String password = request.getParameter("password");
        String since_id = request.getParameter("since_id");
        try {
            try {
                Long.parseLong(since_id);
            }
            catch(Exception e) {
                since_id = "";
            }
            Class.forName("com.mysql.jdbc.Driver");

            String url = "jdbc:mysql://" + db_host + "/" + db_database;
            Connection conn = DriverManager.getConnection(url, db_username, db_password);
            PreparedStatement stmt = conn.prepareStatement("SELECT user_id, credits FROM subscribers WHERE id=?");
            stmt.setString(1, id);
            ResultSet rs;

            rs = stmt.executeQuery();
            if( rs.next() ) {
                String user_id = rs.getString("user_id");
                int credits = rs.getInt("credits");
                if (credits == 0) {
                    response.sendRedirect("http://tw.1t3xt.com/twitext.php?message=You+don't+have+any+credits+left.");
                }
                else {
                    TwitterPdf pdf = new TwitterPdf();
                    TweetProducer t = new TweetProducer();
                    t.createClient(user_id, password);
                    t.setConsumer(pdf);
                    since_id = t.execute(since_id);
                    pdf.close();

                    response.setHeader("Expires", "0");
                    response.setHeader("Cache-Control",
                        "must-revalidate, post-check=0, pre-check=0");
                    response.setHeader("Pragma", "public");
                    response.setContentType("application/pdf");
                    response.setContentLength(pdf.size());
                    ServletOutputStream out = response.getOutputStream();
                    pdf.writeTo(out);
                    out.flush();

                    stmt = conn.prepareStatement("UPDATE subscribers SET since_id=?, credits = credits - 1 WHERE id=?");
                    stmt.setString(1, since_id);
                    stmt.setString(2, id);
                    stmt.executeUpdate();
                }
            }
            else {
                response.sendRedirect("http://tw.1t3xt.com/twitext.php?message=no+user+found+with+id+" + id);
            }
            conn.close();
        } catch (Exception e) {
            if (e.getMessage() != null) {
                response.sendRedirect("http://tw.1t3xt.com/twitext.php?message=" + URLEncoder.encode(e.getMessage()));
            }
            else {
                response.sendRedirect("http://tw.1t3xt.com/twitext.php?message=An+error+has+occurred.+Did+you+provide+the+correct+password?");
            }
        }
    }
}

Now let’s have a look at the database-related code.

Database access

For simplicity, you’ll use plain old JDBC. The parameters needed to connect to the database are fetched from the web.xml file: database name, hostname, username, and password. The first query fetches the Twitter screen name (user_id) and the user’s credits. The id in the WHERE clause is the custom TwiText id that the user received in the mail that was automatically sent upon registering:

SELECT user_id, credits FROM subscribers WHERE id=?

The credits are introduced to limit the number of times a user can hit your server. Every time a PDF is generated and sent to the user’s browser, a second query is executed to subtract one credit from that user in your database:

UPDATE subscribers SET since_id=?, credits = credits - 1 WHERE id=?

This update also keeps track of the ID of the most recent status (tweet) that was “PDFed” by TwiText.

Creating the PDF

Note that we’ve skipped some code. Surely you recognize this snippet:

TwitterPdf pdf = new TwitterPdf();
TweetProducer t = new TweetProducer();
t.createClient(user_id, password);
t.setConsumer(pdf);
since_id = t.execute(since_id);
pdf.close();

It’s almost identical to what you had in your PdfTwitter class’s main() method in Listing 9.

Avoiding known browser issues

Notice that you’re also setting plenty of response headers:

response.setHeader("Expires", "0");
response.setHeader("Cache-Control",
    "must-revalidate, post-check=0, pre-check=0");
response.setHeader("Pragma", "public");
response.setContentType("application/pdf");
response.setContentLength(pdf.size());
ServletOutputStream out = response.getOutputStream();
pdf.writeTo(out);
out.flush();

In a perfect world, you could do something like:

PdfWriter.getInstance(document, response.getOutputStream());

By doing this, you make sure that every byte produced by the PdfWriter is immediately sent to the browser: no need to create a file on disk, no need to keep the file in memory. However, some browsers can’t deal with this byte stream; they need to know in advance how many bytes you’ll be sending. You anticipated this problem by storing the bytes in a ByteArrayOutputStream in your PdfTwitter class. You also provided the size method, so that you can get and set the content length of the response. Only after you’ve set numerous headers — the list is based on the experience of the iText community — can you write the bytes to the ServletOutputStream.

To see this servlet in action, go to https://tw.1t3xt.com/ and register, or take the code and turn it into a servlet that runs on your application server.

In conclusion

This article has covered a lot of territory. Not only did it explain how simple it is to create an application for a popular social-networking site; it also gave you some insight into the possibilities of some free/open source software libraries, especially iText. You’ve learned about the most simple way to create a PDF document, using basic building blocks. But you also found out how you can add text and graphical elements at absolute coordinates. You started off with a standalone application, but eventually you integrated the code of your main() method in a servlet. The end result is a Twitter application that is ready for you to use. Surely you can think of ways to expand this example and offer other Twitter-related services on your site.

Bruno Lowagie is the original developer and one of the current maintainers of iText. He works for Ghent University, 1T3XT BVBA (Europe) and iText Software LLC (US). He lives in Ghent, Belgium with his wife and two sons.