by Paul Feuer

The Op Framework: A better Java database framework

news
Jan 30, 200719 mins

A holistic approach to database access with debugging, testing, tracing and usage reporting

After first glancing at the title of this article, perhaps you’re thinking, “yet another database access article, blah, blah, blah.” But you’ll find the approach described here very useful. And I’m hoping that not only you, as a programmer, will find it useful, but also your quality assurance team, your help desk and your database administrator.

At the company I work for, our original database framework was somewhat problematic, dynamically building its own SQL statements, executing statements and logging inconsistently. The statements were executed, and the results processed and returned, but in a larger sense, we didn’t have much of an idea of how much work was being triggered by a single request or have much of a chance to optimize our SQL. Our Oracle DBA would just see some chunk of a statement in his “top offenders list” and send out an e-mail saying “Does anyone recognize this?” Many times, we wouldn’t know what parameters were used when calling the statement, complicating our ability to track down why Oracle’s cost-based optimizer was making poor decisions.

Now that we’ve begun to migrate to the Op Framework, we can quickly identify the worst-performing pages (in terms of number of queries, connections used, time in the database). What’s more, when fielding a slow-page complaint, we can now see which calls are taking a long time for that particular request or return other debug information related to our request.

This tracing is made possible by taking a consistent approach to object creation, logging and resource usage in a way that each log statement can be associated back to a specific request. In this article, we look at the implementation of this database access framework and its tracing functionality.

These are some of the cool things the Op Framework does:

  • Dynamically turns on “dev-mode” request tracing, which is returned in hidden HTML to the developer’s browser.
  • Logs database resource usage to the filesystem for automated graphing, monitoring and “worst-offenders” reports to support smart tuning.
  • Dynamically turns on “database capture mode,” which saves database result sets to the filesystem for later reuse (even in the absence of a database).
  • Decouples execution from Enterprise JavaBeans (EJB) and Java Naming and Directory Interface to make unit testing more straightforward.

In describing this framework, I begin at the servlet level (Struts, in our example) by discussing the Op object. Next, we look at database call delegation, then at OpExecutors, retrieval of DataSources, and navigation of EJB (or not). Then we look at the advanced tracing and data-stubbing features. After examining these levels, we return to the servlet to see what trace information we collected.

The Op

The Op object is a simple implementation of a Command pattern. “Op” is, of course, short for “operation.” The word captures the simplicity of the object and is more convenient to type. Ops are meant to encapsulate some piece of logic that will execute in the subclass’s implementation of the abstract _execute(CallExecutor) method. The underscore is meant to convey that this is not a method the developer should be calling directly. This method is called by an OpExecutor.

The OpExecutor interface specifies a simple method for executing an Op. This is an interface because in different situations, we’ll want to execute this Op differently. In an EJB environment, we would want to navigate EJB for execution, while in my IDE, I would just run it directly. The Op base class allows us to specify our own OpExecutor, but by default, we rely on the OpExecutorFactory to provide us one.

Notice in Figure 1 that there are two Op constructors: a public one requiring a TraceKey and a protected one requiring another Op. The TraceKey, which I’ll discuss in more detail below, uniquely identifies every request so we can see inside every Op what request caused this Op to be created and executed. The TraceKeyFactory is responsible for creating or retrieving the TraceKey.

Example Op instantiation and invocation:

 TraceKey traceKey = TraceKeyFactory.getTraceKey( httpRequest );
    DoSomethingOp doSomething = new DoSomethingOp( traceKey );
    doSomething.setFooId( fooId );
    doSomething.setBarId( barId );
    doSomething.execute();
    List results = doSomething.getMyDbResults();

During the Op‘s execution, in addition to the usual business logic programming, we can call other Ops and access the database.

Example execute implementation for an Op:

 public void _execute( CallExecutor exec )
    {
        // use Op's call creation convenience method
        // "sp_my_proc" turns into "{call sp_my_proc (?,?)}"
        DatabaseCall callOne = makeProcedure( "sp_my_proc");
        callOne.setParameters( new Object[] { "Hello", "World" } );
        exec.executeDatabaseCall( callOne );

        // do more processing . . .

        // Oracle function call "f_my_function" turns into 
        // "{?=call f_my_function (?,?)}"
        DatabaseCall callTwo = makeFunction( "f_my_function" );
        callTwo.setParameters( Types.VARCHAR
                             , new Object[] { "So", "Long" } );
        exec.executeDatabaseCall( callTwo);

        String returnedString = callTwo.getParameter( 1 ).getValue();
    }

With the Op Framework, parameter setting is simplified from the verbose Java Database Connectivity (JDBC) statement methods. With DatabaseCalls, there is implicit behavior in the constructors: a single-argument object represents implicit input (with an attempt to look up common JDBC type mappings), and a single-argument int represents implicit output. DatabaseCall‘s setParameters() methods iterate the object array, and if an object is not a parameter, the object is made into a parameter using the single-argument constructor. For null or other atypical parameters, you must use the full three-argument parameter constructor identifying the jdbcType, direction, and value.

OpExecutors and CallExecutors

The OpExecutor interface is simply one method: executeOp(Op). In our default JBoss EJB environment, the EjbOpExecutorFactory creates the EjbOpExecutor objects for us. These EjbOpExecutors are initialized with a simple EJB path, where we find an EJB component that can run our Op. In our IDE, or in a continuous testing environment, it would probably suffice to use DirectOpExecutor, which directly invokes the Op‘s _execute(CallExecutor) method. Figure 2 is the sequence diagram depicting the instantiation and execution of an Op: its use of the OpExecutorFactory abstract factory, the navigation of EJB, and its invocation of the methods on the CallExecutor interface.

Figure 2. Op creation and execution. Click on thumbnail to view full-sized image.

The DirectOpExecutor instantiates the CallExecutorDB, which receives DatabaseCall objects, transforms them into JDBC statements, executes them, and puts any statement outputs back into the DatabaseCall‘s registered parameters (this sequence is discussed below).

Making database calls

Many frameworks delegate calls to the database by writing SQL for you. The Op Framework does not do this. In our company, much of our business logic resides in the database and is accessed through packages, functions and procedures. We also have a complex enough data model that the thought of trying to set up the XML to describe it and query against it gives me a headache. We can send parameterized SQL statements, but usually we just use procedure names. In using this approach, we not only receive the benefit of the database’s usefulness in operating over sets, but we can also tune the SQL without redeploying our application server. It also means our Java code is now less coupled to our database schema, as the functions and procedures act as an API.

The Op Framework does delegate calls, because, when we delegate calls, we also receive the benefit of consistent logging, execution, result-set transformation and resource closing. Using the DatabaseCall object to describe our interaction with the database, we get a leaner and simpler interface to prepared and callable statements, without coupling ourselves to JDBC connections and implementations. The pieces of Statement and ResultSet interfaces that we lose using the DatabaseCall object shouldn’t be accessed in high-performance applications; rather, we should implement database procedures. Furthermore, in the kinds of result sets we’re returning, which may contain information across several tables, looking for updates and scrolling could really burn us.

The DatabaseCall object used by the Op is a container class holding:

  • The text of the call (procedure name or parameterized SQL query).
  • The Op responsible for its instantiation.
  • A mapping of parameter index to parameter object.
  • ResultSets returned by a multi-ResultSet returning call.
  • The name of the datasource used to look up a connection in the MappedConnectionFactory.
  • Options from Op used to enable special functionality.
  • Information about the call’s execution (timing, connection).

Figure 3 shows a class diagram of the DatabaseCall and Parameter classes.

Figure 3. DatabaseCall and Parameter classes. Click on thumbnail to view full-sized image.

DatabaseCalls, as I’ve said, are merely containers and don’t have any behavior in and of themselves; they have the information required to articulate some JDBC call. Implementations of the CallExecutor interface, passed in through the _execute(CallExecutor) method, handle DatabaseCalls. And the usual one is the CallExecutorDB, which, as you might guess from the name, executes DatabaseCalls against a database.

The CallExecutorDB transforms DatabaseCall objects into JDBC statement objects and consistently executes them against the database: processing pre-call options; setting up the statement with its parameters; executing the statement; processing any output; recording timing information, row count information; logging the call with its parameters; and finally, processing any post-call options. Figure 4 shows how the CallExecutor uses auxiliary classes to build JDBC statements and record their outputs.

Figure 4. Database call execution sequence. Click on thumbnail to view full-sized image.

The CallOutputHandler handles the various types of output types, including, in our case, ResultSets (also Oracle RefCursors, which implement ResultSet). Our ResultSets, for example, are transformed into collections of maps. After this transformation, we can track how many rows we processed and how long that task took. These values are available in DatabaseCall‘s CallExecutionInfo object.

When finished, we’ve achieved execution of a database call delegated using an interface to an object that sends the described call through JDBC; the execution of the call is consistent, logged and timed.

Activating behavior using parameter triggers

Some of the Op Framework’s features, such as data capture and return, and tracing (both to be discussed below), are activated by using special request parameters in the URL. The ParamTriggerFilter examines HttpServletRequest parameters for these special settings. The Filter is initialized by TriggerInitializer instances that are registered in init-params of the Filter‘s configuration in the web.xml file.

Each TriggerInitializer registers ParamTrigger objects in the Filter. When the Filter sees a new URL coming in, it looks to see if there are ParamTriggers that respond to each name. If a ParamTrigger responds to the parameter name, the trigger is fired, which causes any listeners on the trigger to execute; they typically set options that Ops in the request will use during their execution. The Op Framework implements several triggers that fall into two categories: tracing and data capture, both of which are described below.

Tracing in the Op Framework

The Op Framework uses the Apache Foundation’s popular log4j library for logging. Tracing was added to allow us to collect logging statements for each user’s HttpServletRequest and provide debugging information to an user. In my company’s case, that user was a developer responding to a bug and wanting to quickly log into our site to see what was happening without establishing various Secure Shell tunnels and tailing a massive log file to find the right statements across a heavily used Web site. The Op Framework’s tracing functionality allows us to see our requests’ log statements in HTML output.

Developers using the Op Framework to create new Ops can log debug statements just as they normally would, except that to associate them with the request, they need to include the TraceKey:

 Logger.getLogger( getClass() ).debug( traceKey + " Loggable condition" );

When tracing is on, these statements are collected and made available to you so that you can output them to your generated HTML. They are collected with a custom Appender and LogEvent filter that only accepts messages containing the TraceKey.

Some events in the framework are automatically logged, such as Op timing and DatabaseCall timing.

This tracing is not always on, but can be turned on using a special URL parameter, “trace” by default. Thus, a call to http://mycompany.com/Hello.do?trace=1 would turn on tracing for that request and any subsequent request in the session, or until you specify “trace=0” in the URL.

In my company’s environment, we have a hidden div with the trace output in it. If the trace parameter is there, a JavaScript method attaches a node containing the string “Trace,” and a click handler that pops up an IFrame with the contents of that hidden div. This code is included in the source file that accompanies this article.

Capturing and stubbing database calls

Another feature available through the use of special URL parameters is the capturing and stubbing of data. Let’s say you’ve meticulously put test case data in your database to exercise some boundary conditions in your code. The procedure returning this data is called “sp_project_time,” and is used by a Struts action registered at “FinanceTimesheets.do.” Let’s call this test case Project07. We can invoke our action with the URL and the special parameter: http://localhost:8080/op-example/FinanceTimesheets.do?persist.SaveSet= Project07.

The “persist.SaveSet” parameter triggers the call to execute normally, except the framework injects some data capture into the data fetching. When the call finishes, the framework serializes the output to an XML file on the filesystem. Later, if you want to recall that same data, you can use this URL: http://localhost:8080/op-example/FinanceTimesheets.do?persist.LoadSet=Project07.

The Op code does not change. It creates the DatabaseCall normally, executes it normally, and gets the data out of the DatabaseCall normally; it doesn’t know the difference. What is different is the CallExecutor coming into the _execute(CallExecutor) method. Instead of a CallExecutorDB, it’s a CallExecutorDBUnit. The DirectOpExecutor, upon seeing a particular option in your Op, instantiates the CallExecutorDBUnit. The loading mechanism can also be used to decouple your Web development from your database development, as the XML files can be created by hand and even deployed with your application server until the database is done and has data in it.

The framework uses two types of sets and two types of XML file path creation. The “persist” in the above URL examples instructs the framework to store or look up the captured XML at a file location outside of the application server’s deployment path. This allows the XML to persist between redeployments of an ear file. The other style is “deploy” (such as “deploy.SaveSet” and “deploy.LoadSet”), which instructs the framework to look within the WEB-INF directory. This means that you can actually keep sets in your source tree and even deploy them to your application servers for use by automated testing packages, such as Selenium. You would just instruct Selenium to include the parameter “deploy.LoadSet” in its requests.

The two types of XML file path creation are simple and session. If you turn on data capturing for an entire session (collecting data for a view-level contractor, for example), the framework must generate a file path that includes information about the parameters used by the calls to disambiguate among several calls of the same function. The simple case merely uses the name of the set. Session capture and loading is invoked by prepending “session-” to the aforementioned parameter names, as in “session-persist.SaveSet.” The code below demonstrates the difference in paths between a session-based deploy setting and a non-session-based persist setting:

 /jboss/server/wonderland/deploy/myear.ear/mywar.war/WEB-INF/op_data/FINANCE
     /Project07/FinanceTimesheets/_projectSid=1/sp_project_time.xml
     /Project07/FinanceTimesheets/_projectSid=3/sp_project_time.xml

/opt/op_data/FINANCE/Project07/sp_project_time.xml

Tracking usage: OpModule, TraceKey, WebAppStats and Snapshot

When tracking what our application is doing, we want to know:

  • Who’s responsible for causing the application to do something (such as the user).
  • Who’s responsible for writing the code to do it.
  • Where the code is that is executing.
  • What is executing.
  • How quickly and efficiently did it execute.

In addition to identifying the user (see TraceKeyFactory Javadocs for how this is done), the TraceKey also carries with it the OpModule that has “claimed” the request. An OpModule represents a logical subcomponent of a larger application. For example, at our company, we generally have a separate module for each product we offer. Modules claim requests by evaluating the URL and stating how much of it they’re responsible for. For example, in an intranet that has a module for each department, the FINANCE module may claim URLs that start with /intranet/finance/, while the HR module may claim URLs that start with /intranet/hr/.

Modules are loosely coupled to the deployment and startup process. As part of the deployment process, a build file will copy .module.properties files into the WEB-INF directory. At startup, the OpModuleInitializerServlet will look for them, instantiating each module and registering it in the OpModuleRegistry. This registry is then used by the TraceKey to find a module to claim the incoming URL.

Now that we know who is responsible for the request and the code handling the request, let’s look at the Op Framework’s two usage-tracking mechanisms. This first one is event-based and is implemented by the WebAppStats class that logs DatabaseCall and HttpServletRequest completions to its logger. Here is an example of DB and REQ events being logged, along with a note about what each field is:

 // Database Call Completion
DEBUG -  ]-[ 2007-01-09 09:03:52 ]-[ DB ]-[ 87987 ]-[ 87987-3-200793 ]-
[ 5 ms ]-[ 11 rows ]-[ OK ]-[ MODULE_A ]-[ SUBMODULE_X ]-[ DBUSER ]-[ F ]-
[ com.mycompany.ops ]-[ ClassSummaryOp ]-[ pkg_web.f_class_summary ]-
[ {OUT-1} {IN-2 2987987} {IN-3 "Foo class"} ]-[ callHash-403329426 ]-
[  ]-[ {?=call pkg_web.f_class_summary(2987987,'Foo class')} ]-[

// which corresponds to
Date ]-[ Event Tag ]-[ UserId ]-[ TraceKey ]-[ Elapsed Time ]-[ Total Rows ]-
[ OK | ERR ]-[ Responsible Module ]-[ Responsible SubModule ]-[ DB Connection User ]
-[ Call Type(F/P/Q/U) ]-[ Op's package ]-[ Op's name ]-[ Call's Text ]-
[ Parameter List ]-[ a Hash of the Call's Text ]-[ Error Message (if any) ]-
[ Call's parameter-populated text ]-[

// Request Completion
WARN - [ 2007-01-09 09:06:23:801 ]-[ REQ ]-[ 87987 ]-[ 87987-3-200793 ]-
[ 3766 ms ]-[ 11 rows ]-[ OK ]-[ MODULE_A ]-[ SUBMODULE_X ]-[ 1 cx ]-[ 1 ops ]-
[ 1 calls ]-[ 5 inDB ]-[ 48 getCx ]-[ 1blghqx32w1ps session ]-[ /ClassSummary.do ]-[

// which corresponds to
Date ]-[ Event Tag ]-[ UserId ]-[ TraceKey ]-[ Total Time ]-[ Total Rows ]-
[ OK|ERR ]-[ Responsible Module ]-[ Responsible SubModule ]-[ Total Connections ]-
[ Total Ops ]-[ Total Calls ]-[ Total Time in DB ]-[ Time spent waiting for Connections ]
-[ Session Id ]-[ Request URL ]-[ Error Message (if any) ]-[

We download these log files, parse them into a database, and generate daily reports about the worst-performing requests, and use those reports for our improvements.

The other tracking mechanism is time-based and implemented by the Snapshot class, which is updated by WebAppStats as calls and requests pass through. Then periodically (for example five minutes), it will dump its numbers for each OpModule to its log file. This log file then can be tailed and parsed to provide information to real-time charting applications like Cacti. Below are two examples of Snapshot log statements from the FINANCE and HR OpModules:

 DEBUG - [ 2007-01-12 00:00:08:251 ]-[ 2007-01-11 23:58:08:249 ]-
[ 120002 ]-[ FINANCE ]-[ 278 rows ]-[ 200 ops ]-[ 287 calls ]-[ 208 cx ]-
[ 28 reqs ]-[ 0 tmpKeys ]-[ 0 dbErr ]-[ 0 reqErr ]-[ 207820 dbTime ]-
[ 298729 reqTime ]-[ 0 traceReq ]-[
 
DEBUG - [ 2007-01-12 00:00:08:485 ]-[ 2007-01-11 23:58:08:249 ]-[ 120236 ]-[ HR ]-
[ 297 rows ]-[ 298 ops ]-[ 287 calls ]-[ 287 cx ]-[ 298 reqs ]-[ 0 tmpKeys ]-[ 0 dbErr ]-
[ 0 reqErr ]-[ 20902 dbTime ]-[ 298722 reqTime ]-[ 0 traceReq ]-[ 0

What about Spring and Hibernate?

The Spring and Hibernate frameworks are not adequate for my company’s purposes. We have multiple services (some non-Java-based) writing data into our database. This data would circumvent the object-relational persistence functionality of Spring and Hibernate. We have a large Oracle database that would be very hard to describe in XML. To make queries efficient, we rely on Oracle-specific features to squeeze all the performance out of the database that we can. This also means that we use mostly prepared statements instead of long, parameter-including query strings to take advantage of Oracle’s statement caching. CallableStatement executions of procedures and functions are more locatable and easier to update without redeploying an ear file.

In addition, we had just tried to model SQL, and found that it introduced many problems, the most sinister being developers treating the database like an object repository instead of a database. Thus, laziness began to creep in, where once we introduced the logging we saw that some developers were making hundreds, sometimes thousands, of database calls during a single request.

Conclusion

My IT team developed the Op Framework to solve numerous problems that were plaguing us in development, maintenance, deployment and production. Some of these problems stemmed from a hard-to-manage set of shared XML configuration files and some from dynamic SQL-writing frameworks. In the end, we wanted to develop something that added “good magic” and reduced “bad magic.”

Op Framework reduces the bad magic by not relying on its own XML configuration files, and decoupling modules and their configurations through OpModules’ properties files. It also decouples code from EJB and even the database itself, making it easier to learn, develop, test, maintain and deploy.

It adds good magic by adding tracing, tracking, logging and data stubbing for free. This gives developers and your help desk tracing information to live production situations. It gives quality assurance the ability to set up and run test cases, and allows off-site contractors to disconnect from your database. Plus, it gives your DBA and operations staff information about site and database usage, allowing you to focus on tuning the code that most urgently needs tuning.

Paul Feuer has developed large-scale intranet and extranet commercial Web applications for the past nine years. He is currently a senior software engineer for Wireless Generation, an education technology and services company serving 100,000 teachers and 2.5 million students. He has a BA from Columbia University and an MS in computer science from New York University.