by Laurence Vanhelsuwé

Speed up batch file processing using generic programming and core reflection

news
Nov 1, 199813 mins

Don't let batch file processing slow you down -- even if it is the computer that's doing all the repetitive monkey work

If you’re at all like myself, you don’t only use Java to write applications, applets, and JavaBeans. You also use Java to write everyday file and system utilities. You know, those myriad tools designed to make our lives as programmers easier. (Things like multiple-file search and replace, file analysis tools, file format massagers, and so on.) I’ve written several utilities that take a single file argument, and it often happens that I need to run such utilities on a list of files — in other words, do some batch processing (typically on java or html files).

For example, say I’ve written a TAB-to-space file-converting utility (called Tab2Spc), and I need to apply this to all Java source files in a directory by creating a plain batch file like the following:

java Tab2Spc Angle.java
java Tab2Spc Border.java
java Tab2Spc Circle.java
 :             :
java Tab2Spc Value.java
java Tab2Spc World.java

Because Java executables only start running after the Java interpreter has loaded and initialized itself, you’re looking at a multi-second delay before your utility can do any useful work. If that work itself only takes seconds (or less, as in the case of a simple TAB-to-space converter), using this utility in a batch script will waste an inordinate amount of time, by repeatedly loading and initializing the Java virtual machine (JVM).

What we really want is to have the JVM load and initialize just once per entire batch-processing session. The intuitive solution seems to be to create a ForEach Java utility that takes two parameters:

  1. The name of another Java program

  2. The name of a file containing filenames to execute the argument program on

Our previous batch file could then be transformed to the more simple ASCII file containing only the filenames to operate on:

Angle.java
Border.java
Circle.java
    :
Value.java
World.java

Starting the batch processing could then be achieved as follows:

java ForEach files.lst Tab2Spc

(Read this as: “For each file in files.lst, execute the utility Tab2Spc.”)

Assuming that Tab2Spc had the standard application main() entry point,

public static void main(String[] args)

how would you invoke this main() from within the ForEach utility?

Because main() is a static method, you simply couldn’t invoke it with the 1.0 release of Java! You could dynamically load the argument program easily enough with Class.forName()... and then instantiate an object of the program’s class with Class.newInstance(), but those steps still wouldn’t give you any way to invoke the main() method of the utility. Intuitively, we would like to do something like the following:

Class externalProgram = Class.forName("Tab2Spc");
Object instance = externalProgram.newInstance();
instance.main(someArgs);

That last line is where intuition and reality part company. Method main() isn’t defined for objects of class Object, so we get the compile-time error:

Method main( String[] ) not found in class java.lang.Object.

Fine. More stubborn (but misplaced) Java programmer’s intuition tells me I should use a cast to solve this problem, but a cast to what? The logical answer is a cast to an interface type of Application, where Application defines a single method main(), with the well-known signature. But here’s where we encounter another obstacle: interfaces can’t be used to define static methods. In the prehistoric days of pre-reflection Java, my less-than-perfect workaround solution was to create a very similar Application interface and have all my utilities implement this:

public interface Application {
    void applicationMain (String[] args);
} // End of interface definition

Now, the main() of all my utilities needed to read as follows:

public class XYZ implements Application {
// static main() delegates to non-static main()
    public static void main(String[] args) {
        new XYZ().applicationMain(args);
    }
    public void applicationMain(String[] args) {
        // previous content of static main() goes here..
    }
}

This solution allowed the ForEach program to invoke externalProgram.applicationMain() without any problems, thus executing the argument utility from within Java itself and thereby avoiding a reload of the JVM. For your own library of Unix-style command-line utilities, it takes 30 seconds to make any utility implement the Application interface, which is no great hardship. On the other hand, if you use third-party utilities you don’t have the source to, this solution would leave you stuck under Java 1.0.2.

With the 1.1 release of Java comes a very powerful new API that allows us to do away with my Application interface: Core Reflection. I’m not going to explain all the possibilities and uses of this API here (and they are very, very numerous), except to say that, with it, you can hone in on any method of any class and execute that method. This method-invocation capability of the API was really designed to invoke instance methods (that is, invoke methods on objects), but luckily for us, it also allows invocation of static methods (for example, it can call class methods). This new possibility allows you to write a ForEach program that can take plain argument applications with the standard main(), and nothing more.

Now, if you’ve ever done Postscript programming, you may be asking: Why is he calling this utility “foreach”, and not “forall”? (Tcl or Dylan hacks, on the other hand, wont be complaining.) The reason has to do with my using ObjectSpace Inc.’s Java Generic Library (JGL) Applying.forEach() generic iteration algorithm to take care of the central batch iteration loop. (JGL is a data structures and algorithms framework. See Resources below for links to other articles and sites dealing with JGL.) This necessarily leads us, via a brief but hopefully valuable detour, through the truly awesome world of generic programming (using JGL). For those who still think design patterns are the coolest thing in Programming Land, wake up: there’s something else out there that’s almost as cool generic programming. (If you aren’t familiar with the concept of generic programming, I humbly advise you to check out some literature on it.)

The power of generic programming

Generic programming is all about taking abstraction, that beloved object-oriented technique, to its logical conclusion when applied to the field of algorithms and data structures. A QuickSort is a QuickSort, whatever underlying data structure it operates on, right? Same for a BinarySearch, and so on. On the other hand, a linked list should be independent of any of the algorithms that act on it. It’s the same with trees, stacks, sets, and so on, right? So why do 97 percent of all programmers reimplement these things from scratch, over and over again? Generic programming aims to eliminate this perpetual (and IT-budget-squandering) wheel-reinventing by completely decoupling algorithms from data structures, and making these decoupled entities available in perfectly reusable forms. All you need to do is reuse them, add some glue and application-specific code, and you’ll never implement a sorting routine or linked list again!

But let’s not digress too much. If you pop the current item of your mental stack, we’re back with our ForEach project. Once the generic-programming bug has bitten you, you start feeling sick at the thought of implementing loops the old way (meaning manually using language-level loop constructs). So I didn’t want to write a silly little for or while loop to drive the batch iteration in my utility. Instead I wanted to rely on JGL’s public static UnaryFunction forEach(Container container, UnaryFunctionfunction).

The forEach() method does something to every element of the argument Container (this is the JGL Container, mind you, not to be confused with the Abstract Windowing Toolkit’s container). That something is determined by the UnaryFunction argument. For my ForEach utility, I needed a function that invokes main() on a specified class. Or better still, a function to invoke any static method on a specified class. Relying heavily on Core Reflection, here it is:

//======================================================================== // GenericUnaryFunction (C) Mar 1998 Laurence Vanhelsuwe - All Rights Reserved // -------------------- // UnaryFunction that lets you invoke any STATIC method in any class that // takes a single Object (or subclass) argument. // // History: // -------- // 30-Mar-98: started this file from ClassForName.java // // Author e-mail: lv@telework.demon.co.uk //======================================================================== package telework.utilities.jgl.functions; import COM.objectspace.jgl.*; import java.lang.reflect.*; /*************************************************************************** * UnaryFunction that lets you invoke any STATIC method in any class that * takes a single Object (or subclass) argument.

* For example, the ClassForName unary function can be replaced by using * the following GenericUnaryFunction constructor:

* <code>* GenericUnaryFunction(&quot;forName&quot;, java.lang.Class.class, String.class) *</code>

* One of the interesting applications of this class is to invoke the

<code>* public static void main (String[] args)</code>

* methods of applications, for example, to create a forall CLI command. * See class telework.utilities.files.ForAll for an implementation. * * @version 1.0 30/03/98 * @author Laurence Vanhelsuwe (utils@telework.demon.co.uk) * @see telework.utilities.files.ForAll **************************************************************************/ public class GenericUnaryFunction implements UnaryFunction { protected Method theMethod; // we cache the target Method for execute() public GenericUnaryFunction(String methodName, Class motherClass, Class singleArgumentType) { Class[] argListTypes = new Class[1]; argListTypes[0] = singleArgumentType; // try to find the Method object that corresponds to the method // the user is specifying. The method can only take a single // argument. // **!! single arg should not be primitive // **!! method should return an arg !? try { theMethod = motherClass.getMethod(methodName, argListTypes); } catch (NoSuchMethodException noMethod) { String errMsg = "GenericUnaryFunction: couldn't find method '"; errMsg += motherClass.getName() + methodName + "("; errMsg += singleArgumentType.getName() + ")"; System.err.println( errMsg ); theMethod = null; } catch (SecurityException noAccess) { String errMsg = "GenericUnaryFunction: couldn't find method '"; errMsg += motherClass.getName() + methodName + "("; errMsg += singleArgumentType.getName() + ")"; System.err.println( errMsg ); theMethod = null; } } /** * Pass the argument object to the method dynamically specified at * construction time. Return the object that that method returned. * * @param anObject the object that's going to be passed to the method * specified at constructor time. * @return the return value of the method, if any (null if void). */ public Object execute(Object anObject) { Class aClass; Object returnValue = null; Object[] singleArgument = new Object[1]; singleArgument[0] = anObject; try { returnValue = theMethod.invoke(null, singleArgument); } catch (IllegalAccessException illegalAccessException) { System.err.println("GenericUnaryFunction: "+ illegalAccessException); } catch (IllegalArgumentException illegalArgumentException) { System.err.println("GenericUnaryFunction: "+ illegalArgumentException); } catch (InvocationTargetException invocationTargetException) { System.err.println("GenericUnaryFunction: "+ invocationTargetException); } return returnValue; } } // End of Class GenericUnaryFunction

Class GenericUnaryFunction marries the currently under-exploited potentials of Core Reflection and JGL to form one hot little class! (By the way, the class shown here is not yet industrial strength, as you may have deduced from the comments tagged with a **!! marker.) The way GenericUnaryFunction works is by using Core Reflection to find the method that needs to be called by the function (using Class.getMethod()) and to dynamically invoke that method (using Method.invoke()). For performance reasons, these two aspects are split evenly between the class constructor and the UnaryFunction‘s execute() method.

There remains one minor obstacle to implementing the ForEach batch processing utility: if we store every line of the batch file (the list of filenames) as String elements in a JGL container, our Applying.forEach() won’t work because GenericUnaryFunction.execute() requires a single argument of type String[] (that’s what it needs to pass on to main(String[])), and not simply an argument of type String. Here, another JGL generic algorithm comes to the rescue: Transform.collect(). Using it, we can apply a transforming function to turn our Strings into one-element arrays of String. This transforming function is rather specialized, so I used an inner class to avoid having to create a proper class for it.

By now you should understand all the pieces necessary to implement a generic main()-invoking ForEach batch-processing utility. Using JGL to do most of the grunt work for us (and isn’t that the whole point of class libraries?), here’s the final implementation:

//====================================================================
======
// ForEeach           (C) Mar 1998 Laurence Vanhelsuwe - All Rights Reserved
// --------
// Java replacement for the MS-DOS "for" command.
// By avoiding the repeated reloading of the JVM, you can execute Java
// applications on lists of files much faster.
//
// History:
// --------
// 16-Dec-96: started this file
// 21-Mar-97: added support for passing arguments to external program
// 30-Mar-98: rewrite using GenericUnaryFunction and JGL
//
// Author e-mail: lva@telework.demon.co.uk
//====================================================================
======
package telework.utilities.files;
import COM.objectspace.jgl.*;               // for Array
import COM.objectspace.jgl.algorithms.*;    // for ForEach
import telework.utilities.*;
import telework.utilities.files.*;
import telework.utilities.jgl.functions.*;
/***************************************************************************
 * Java replacement for the MS-DOS "for" command. By avoiding the repeated
 * reloading of the JVM, you can perform Java-based batch processing much
 * faster.
 * @version 1.0 31/03/98
 * @author  Laurence Vanhelsuwe (utils@telework.demon.co.uk)
 **************************************************************************/
public class ForEach {
final static boolean debug = false;
public static void main (String[] args) throws Exception {
String[] execArgs = {"dummy"};
boolean verbose = false;
    if ( args.length < 2 ) {
        System.out.println("Usage: ForEach <fileList> <Java class> <arguments>");
        System.exit(10);
    }
    String listOfFiles = args[0];
    String programName = args[1];
    CLIKit.printUnderlined("ForEach v1.0 (C) Laurence Vanhelsuwe");
    System.out.println(
        "Batch run of '"+ programName +
        "' using command line arguments in file '"+ listOfFiles +"'."
        );
    // find the Class (i.e. program) that we will have to load dynamically
    // and invoke its main()
    Class externalProgram = null;
    try {
        externalProgram = Class.forName(programName);
    } catch (NoClassDefFoundError notFound) {
        System.out.println("Could not find "+ programName +" ("+ notFound +")");
        System.exit(10);
    }
    // Cache entire list of argument lines
    Array list = new Array();
    FileKit.loadLines(listOfFiles, list);
    // Now transform all these Strings into a 1-el array of String[]
    // so that Applying.forEach() will pass the correct argument type
    // to the utility's main(String[])
    list = (Array) Transforming.collect(list, new UnaryFunction() {
                        // **!! inner class
                        public Object execute (Object s) {
                            String string = (String) s;
                            String[] strArray = new String[1];
                            strArray[0] = string;
                            return strArray;
                        } // end of method body
            } );
    UnaryFunction func =
        new GenericUnaryFunction("main", externalProgram, execArgs.getClass() );
    // And now...the heart of the program. Invoke the external program
    // repeatedly, passing different command line arguments (typically
    // file names) to effect some batch processing.
    Applying.forEach(list, func);
}
} // End of Class ForEach

The CLIKit and FileKit classes contain various support routines for command-line-interface programs’ work and file manipulation. Their trivial printUnderlined() and loadLines() methods won’t be presented here.

When I have to evaluate the quality of source code, one characteristic that I find very important is the level of “fit” or “match” between what the source code says and what the fundamental steps are that embody the solution to the problem at hand. In the case of our ForEach utility, the expressiveness and level of abstraction of the

Applying.forEach(list, func);

statement leaves me with no doubt that the ForEach utility is logically implemented in a way that’s not all that far from perfection.

Conclusion

A problem as innocent-sounding as the need to speed up batch file processing (by eliminating time-wasting JVM reloading) led us to some state-of-the-art implementation techniques: JGL-based generic programming and Core Reflection. The resulting code may (initially!) look and read somewhat hostile because of the use of these high-tech tools, but this code is in fact very elegant. The coded guts of the presented program map one-to-one to the abstract steps of the batch-processing problem’s trivial solution algorithm. In my experience, such perfect mappings between program statements and algorithmic steps point to a high-quality implementation.

Laurence Vanhelsuwé is an independent software engineer. Self-taught, he dropped out of college and immediately started a professional career writing arcade games. He has worked on such diverse technologies as X.25 WAN routers, virtual reality flight simulation, Postscript, and realtime digitized video-based traffic analysis.