Introducing Jamaica, a JVM macro assembler language Most Java programmers, at one time or another, have wondered how the JVM works. Java bytecode programming reveals much insight into the JVM and helps developers program Java better. Also, the ability to produce bytecode at runtime is a great asset and opens doors for new options and imaginations. Historically, various language systems have invented their own runtime systems; today, many want to switch to Java for a good reason: to leverage the free hard work of others who port and optimize the virtual machine on numerous platforms. For pure-Java system software, dynamically generated bytecode may provide performance impossible otherwise. For example, suppose an RDBMS (relational database management system) supported functions in queries like this: SELECT GetLastName(name) FROM emp WHERE CalcAge(birthday) > 30; it is possible and desirable for the database engine to create and use a native Java method rather than simply interpret.A JVM is a simple stack-based CPU with no general-purpose programmable registers. Its instruction set is called bytecode. Code and data are organized in JVM classes (to which Java classes are mapped), but the JVM does not support all Java language features directly. The JVM Specification also defines many verification and security rules. Even so, programming at the bytecode level proves error-prone and risky, as I personally witnessed in Jamaica testing. Also, bytecode programs tend to be longer than other CPU assembly programs because JVM instructions mostly operate on the stack top. Jamaica tries to address some of these issues, adopting a Java-ish approach: it uses Java syntax for class infrastructure declaration and symbolic names in instructions for references to variables, fields, and labels. Moreover, Jamaica has defined numerous macros for common patterns, making it much easier to read and write JVM assembly programs.Because the JVM Specification does not define an assembly language, a few efforts have been made—the best-known thus far is Jasmin—and Jamaica is the latest. This article introduces the Jamaica language with many examples, details the instruction set’s more complicated instructions, and elaborates on all Jamaica’s macros. An equally important part of Jamaica is the underlying abstract API for creating JVM classes; this API and its close relationship with Jamaica are introduced in this article. Assembly programming is closely related to the CPU architecture, but this article does not cover JVM architecture extensively. In the end, this article summarizes Jamaica’s benefits and limitations. Let’s start by looking at an example:public class CHelloWorld { public static void main(String[] args) { getstatic System.out PrintStream ldc "Hello, World!" invokevirtual PrintStream.println(String)void } } The code above looks quite familiar to Java programmers, except for the method body, where executable code is written in the Jamaica bytecode instruction format. All class names are in Java format rather than JVM format. As a “Jamaican convenience,” Java classes in java.lang, java.io, and java.util are automatically imported, so you can use the class names directly without package prefixes. If we use macros, we can reduce the code above to a single statement and easily do more:public class CHelloWorld { public static void main(String[] args) { %println "Hello, World!" %println <out> "Hello, World!" %println <err> "This is NOT an error!" } } The %println is probably the most used macro for debugging purposes. It prints to either System.out (by default) or System.err. With macros, reading (and writing) JVM assembly code is much easier. The following example is slightly juicier: public class CHelloWorld { public static void main(String[] args) { Date d; %set d = %object Date %println "Hello, World!nIt is ", d, '.' } } Let’s compile and test this program. First off, download Jamaica and install. Then run this:% java com.judoscript.jamaica.Main CHelloWorld.ja If everything goes well, a file named CHelloWorld.class will be generated in the current directory. (If the class belongs to a package, you need to move it wherever appropriate.) To verify, run the class with java (even if the class does not have a main() method). If the Java verifier reports problems, employ javap -c, a commonly used tool that decompiles a Java class and prints its content including bytecode instructions. Javap’s output format differs from Jamaica’s syntax but is close enough. In fact, the easiest way to do JVM assembly programming is to reverse-engineer Java classes with javap or similar tools.Experienced JVM bytecode programmers might have found a “fraud” in the above examples: there are no return instructions at the end of those methods. In the Java language, explicit return statements are not required for methods returning void; at the JVM level, however, they are required. Jamaica checks the code and automatically inserts a return instruction if needed. Now that you have a sense of what Jamaica programs look like, let’s move on to the specifics.Define classes and interfacesJava identifiers, keywords, and comments are also Jamaica’s. In Jamaica’s method bodies, bytecode instruction mnemonics are considered reserved words and should not be used as variable or label names.To define a JVM class and interface in Jamaica, use the exact Java syntax, including the package statement for the class package prefix, extends, and/or implements clauses. The import statements can be used to introduce Java class-name shortcuts. Java class names used in programs are in Java format (e.g., java.lang.String) rather than JVM format (e.g., java/lang/String). Inner class names use a dollar sign ($) between their own names and their enclosing class names. Class names also follow the Java import rules, and, as mentioned above, java.lang.*, java.io.*, and java.util.* are implicitly imported (at the end of the import list).Fields and symbolic constantsClass data fields are declared in Jamaica as they are in Java, but cannot be initialized, except for static final fields of primitive types, which must be initialized. Initializations must happen either within constructors for nonstatic members or class initialization blocks for static ones.Static final primitive fields are initialized with constant values. A constant value can be a number, string, or a symbolic constant defined either explicitly through the %const statement or as a static final value of this or other classes. Symbolic constants are quoted by { } in the code. They can be used anywhere a constant may occur and are converted to the intended types. For example: %const MAX_COUNT = 10000 %const MONTH_FLD = java.text.DateFormat.MONTH_FIELD public class ConstantTest extends java.sql.Types { static final double MY_MAX_COUNT = { MAX_COUNT }; public static void main(String[] args) { long var = { MAX_COUNT }; double dvar = { MY_MAX_COUNT }; %println "var = ", var %println "dvar = ", dvar %println "java.text.DateFormat.YEAR_FIELD = ", { java.text.DateFormat.YEAR_FIELD } %println "java.text.DateFormat.MONTH_FIELD = ", { MONTH_FLD } %println "java.sql.Types.ARRAY = ", { ARRAY } // in parent class } } Methods and exception handlingMethods, including constructors and class initialization blocks, are declared in Jamaica using Java syntax. Within the method bodies, local variables are declared with Java syntax; they can take constant initializations. Executable code is written with bytecode instructions. Instructions are not terminated with any characters such as ;, and multiple instructions can appear on the same line, although it is recommend to have one line per instruction. Each instruction has a mnemonic and its own format, and the operands follow a consistent convention. Instructions can be prefixed with labels. Variable declarations and bytecode instructions can intermingle.At the method’s end, exception catch clauses can be added. Jamaica does not have an explicit finally mechanism as in Java because the JVM doesn’t either. For example:public class ExceptionTest { public static void main(String[] args) { Writer w; PrintWriter out; label_start: %set w = %object FileWriter(String) (args[0]) %set out = %object PrintWriter(Writer) (w) aload out invokevirtual PrintWriter.close()void goto label_finally label_io: %println "Caught IOException." invokevirtual Exception.printStackTrace()void goto label_finally label_any: %println "Caught an Exception." invokevirtual Exception.printStackTrace()void label_finally: %println "Finally." catch IOException (label_start label_io) label_io catch Exception (label_start label_io) label_any } } In any catch clause, three labels are used. Quoted in parentheses are the start label (inclusive) and end label (exclusive) of the block for the specified exception to be caught; the trailing label is for the handling code. Default constructorIf a class needs a default constructor simply to call the superclass’s constructor, a class-level macro, %default_constructor, does the trick easily. It can be followed by <public>, <protected>, or <private>. Here is an example:class Block extends HashMap { %default_constructor <public> // ... } Bytecode programming and instructionsTo do JVM bytecode assembly programming, you must understand the static structure of JVM classes and the runtime method invocation. (You do not need to concern yourself with thread execution and synchronization. JVM has two instructions, monitorenter and monitorexit, which is all you can do at the bytecode level.)Each Java class has a constant pool, which holds all the class’s constant parts, including constant numbers and strings, Java class identifiers, method identifiers, field identifiers, and so on. Bytecode instructions use constant pool indices to reference those entries. Jamaica uses Java syntax to define a Java class structure and symbolic names in instructions, so the constant pool and other bits and pieces are completely hidden. Each running thread has a “frame” stack; when a method is called, the JVM allocates a new frame on the frame stack to store state information during the method execution; it is popped and discarded when the method returns. The frame maintains information such as local variables and the operand stack. JVM instructions receive values, return results, and pass parameters to method calls on the operand stack. The operand stack is one word (32-bits) wide; values of long and double hold two entries.Parmeters, variables, and “this”When a method is invoked, parameters are added as initialized local variables, with the this reference as the first if the method is not static. Local variables are one-word (32-bits) slots, which fit most JVM data values including object references; values of types long and double take two slots. In Jamaica, most instructions use names to reference variables, but a few instructions can reference variables via their indices, such as aload_0 and istore_2. Look at this example:public void foo(long a, int b) { aload_0 // Loads 'this' on to the stack lload_1 // Loads the long value of 'a' on to the stack iload_3 // Loads the int value of 'b' on to the stack } In Jamaica, there is little reason to use those instructions. The following is recommended instead: public void foo(long a, int b) { aload this // Becomes aload_0 lload a // Becomes lload_1 iload b // Becomes lload_3 } Bytecode instruction basicsJamaica supports most bytecode instructions in the JVM Specification except for the quick and debug instructions, and wide is simply ignored. Some instructions have “wide” versions (such as ldc_w and goto_w); you can use their short forms instead. For a complete description of all instructions, refer to the language user’s guide. I just discuss those that are syntactically not obvious from the JVM Specification.Constant loading instructionsIn the JVM, small value numbers and null can be loaded onto the stack directly via bipush, sipush, and various xconst_n instructions. Other constant values, including strings, are stored in the constant pool and loaded by the ldc (and its variations, ldc_w and ldc2_w, where the "_w" suffix indicates a wide index of two bytes) onto the stack for use. In Jamaica, there is no direct access to the constant pool; instead, ldc is the universal instruction for adding and loading any constants: ldc 129832 // Integer ldc (long)232 // Long and becomes ldc2_w ldc 5.5 // Double and becomes ldc2_w ldc (float)5.5 // Float ldc "ABCD" ldc "ABCD" // Only one entry for "ABCD" in the constant pool ldc 1234 // Jamaica optimizes this to "sipush 1234" ldc 234 // Jamaica optimizes this to "bipush 234" ldc 2 // Jamaica optimizes this to "iconst_2" ldc -1 // Jamaica optimizes this to "iconst_m1" Field access instructionsIn the JVM, class fields are accessed via their indices by these instructions: getfield, putfield, getstatic, and putstatic. For security reasons, these instructions also take a field descriptor, containing class name, field name, and field type, which may seem redundant. In Jamaica, these instructions take this format:getstatic System.out PrintStream putfield myFld int In the second form, no class name is specified for the field name; the field must be in either this class or the parent. (If the parent class does not exist in the current classpath, you must explicitly specify that parent class name.) Invoke instructionsSimilar to field access, method invocation in the JVM takes a method descriptor containing class name, method name, and a method signature, which contains parameter types and the return type. There are four invoke instructions: invokevirtual, invokestatic, invokespecial, and invokeinterface. In the JVM, invokeinterface takes one more operand in addition to the parameter list for some obscure reasons; in Jamaica, invokeinterface uses the same syntax as other invoke instructions:%object HashMap ldc "ABC" invokeinterface Map.containsKey(Object)boolean The invokespecial instruction is for special purpose calls, including constructor calls: new FileWriter dup ldc "foo.txt" invokespecial FileWriter.<init>(String)void Obviously <init> denotes the constructor method name of the class; it can be shorthanded to just <>.For constructors that call other constructors of this or the parent class, this and super can be used as the method name:public class MyStringWriter extends StringWriter { String header; public MyStringWriter() { aload this bipush 100 ldc "%%% " invokespecial this(int,String)void } public MyStringWriter(int initSize, String hdr) { aload this %load initSize invokespecial super(int)void %set header = hdr } } An interesting note: Constructors in Java must start with a call to this() or super(); at bytecode level, there is no such constraint. Switch instructions The JVM has two switch instructions: tableswitch and lookupswitch. The latter is for enhanced performance if the constants are consecutive. Jamaica also defines switch as a synonym for tableswitch, and internally, Jamaica may optimize this to lookupswitch if the constants are consecutive. In any case, the default handler is required. Here is a demonstration:public static void translate(int val) { iload val switch // Or tableswitch 1: label_1 2: label_2 3: label_3 default: label_a label_1: %println "un" goto label_x label_2: %println "dos" goto label_x label_3: %println "tres" goto label_x label_a: %println "que?" label_x: return } public static void translate1(int val) { iload val lookupswitch 1 label_1 label_2 label_3 default: label_a // Same as in translate(). } Other JVM instructions are simple and intuitive, please refer to the language user’s guide. Next, I introduce Jamaica’s macros. In a way, the JVM is like a RISC (reduced instruction set computer) architecture; with Jamaica macros, the JVM, in effect, becomes a CISC (complex instruction set computer) system.MacrosJamaica macros are designed for commonly used patterns in JVM assembly programming. They can take field and variable names (without types), and array-access expressions as parameters. Like instructions, macros are not terminated by any terminating characters. The print macrosThere are three print macros: %println, %print, and %flush. They print to java.lang.System‘s out (by default) or err. The target is specified in < and >. (In the future, the target may be likely expanded to variables or a field holding a java.io.PrintWriter or java.io.PrintStream.) The flush macro does exactly the same as %print, and it also flushes the stream at the end. All three macros can take any number of arguments:public class InputTest { public static String getInput() { %flush "Type something: " invokestatic getBufferedReader()BufferedReader invokevirtual BufferedReader.readLine()String areturn } static BufferedReader getBufferedReader() { InputStream is; getstatic System.in InputStream astore is InputStreamReader isr; %set isr = %object InputStreamReader(InputStream)(is) %object BufferedReader(Reader)(isr) areturn } public static void main(String[] args) { invokestatic getInput()String String line; astore line %println "You typed: ", line } } The object and array creation macro%object and %array create new objects and arrays, and put them on the stack. Both macros, and %concat, are called assignable macros because they can appear on the right-hand side of the %set macro (see below).%object uses a class constructor signature (minus return type, which is always void) and a parameter list. If no parameters are involved, the signature and parameter list can be omitted:%object HashMap %object FileWriter(String) ("foo.txt") %object FileWriter(String) (args[0]) %array has two forms. It can create arrays of one or more dimensions, or an initialized single-dimensional array. The following demonstrates the first form:%array int[4] %array Object [ firstDim ] [ nextDim[0] ] [] [] As seen in the example above, array dimensions can be variables or array-access expressions.This code illustrates %array‘s second form:%array int[] { 1, 2, 4, 8, sixteen, twospowers[5] } The load and set macrosThe %load macro is convenient for loading a variable or a field:public class Foo { static int a; int b[]; void bar(String x) { %load a // becomes: getstatic a int %load b // becomes: aload this getfield b int[] %load x // becomes: aload x } } %set takes this general format: %set left = right, where left can be a variable or field name, or an array-access expression (that must resolve to a single cell, of course.) right can be a constant, a variable or field name, an array-access expression, or one of the assignable macros:%set x = 1 %set y[1] = y[0] %set tmp = %object HashMap %set str = %concat "Length is: ", length, '.' The nice thing about %set is that it also does necessary type conversions for the values:int i; long l; %set l = 9 %set i = l This code segment expands into:ldc (long)9 // lstore l // %set l = 9 lload l l2i istore i // %set i = l %set also initializes variables (note, assignable macros are not allowed as values), so the above code fragment can shrink to:long l = 9; int i = l; The string concatenation macroThe %concat macro is another assignable macro; it can take any number of parameters of all types. It expands into StringBuffer operations and puts the resultant string on the stack.The if-else macrosJamaica’s %if, %else, and %end_if macros are highly welcome for a few reasons. First of all, in an if-else situation, many goto instructions (and their corresponding labels) are required, resulting in ugly code. Secondly, the JVM has many ways to compare values depending on their types, and programmers have to memorize all these differences. Jamaica’s %if macro is quite abstract; it supports all Java comparison expressions:%if ivar >= 5 %println "ivar >= 5" %else %println "NOT ivar >= 5" %end_if Jamaica implicitly generates labels and the goto instructions, and the Boolean expression converts to the appropriate comparison instruction streams based on the operand types and the operation. Try this: create a few if/else statements, generate the Java classfile, and decompile with javap -c; you will appreciate how much mess the %if-family macros save you!The iteration macrosThe other flow-control macros (other than %if) are %iterate and %array_iterate. Macro %iterate can iterate through a java.util.Enumeration or java.util.Iterator instance. Macro %array_iterate iterates through Java arrays:Object tmp; %iterate x tmp // x is either an Enumeration or Iterator %println tmp %end_iterate int idx; %array_iterate arr idx // arr is an array %println arr[idx] %end_iterate The JavaClassCreator APIThe JavaClassCreator API’s main class, com.judoscript.jamaica.JavaClassCreator, is an abstract API for dynamically creating Java classes at runtime. This API closely mimics the Jamaica language and supports all the Jamaica macros. The flow of Java class creation naturally follows Jamaica’s. Currently the API supports ASM and Jakarta Byte Code Engineering Library (BCEL), meaning at runtime, one of the two must be present. The following is sample Java code that generates a Java class on the fly and uses it (corresponding Jamaica pseudo code is listed in the comments where appropriate):import java.lang.reflect.Modifier; import com.judoscript.jamaica.*; /** * This demo will dynamically create a fictitious * "event handler" and use it. */ public class JCCTest extends ClassLoader { static final JCCTest classLoader = new JCCTest(); private JCCTest() {} /** * Any (dynamic) event handler must implement this. */ public static interface EventHandler { public void event(String e); public void event(int e); public void event(int[] e); } public static void main(String[] args) { try { // Create the dynamic class Class cls = generateHandlerClass(); // Create an instance of the dynamic class and use it. EventHandler eh = (EventHandler)cls.newInstance(); eh.event("Cool!"); eh.event(5); eh.event(new int[] { 3, 6, 9 }); eh.event((int[])null); } catch(Exception e) { e.printStackTrace(); } } static Class generateHandlerClass() throws JavaClassCreatorException { String[] paramNames = new String[] { "e" }; JavaClassCreator jcc = JavaClassCreator.getJavaClassCreator(); jcc.startClass(Modifier.PUBLIC, "DynaHandler", null, new String[]{ "JCCTest$EventHandler" }); jcc.addDefaultConstructor(Modifier.PUBLIC); // Create method: // public void event(String e) { // %println "String event: ", e // } jcc.startMethod(Modifier.PUBLIC, "event", new String[]{ "java.lang.String" }, paramNames, "void", null); jcc. macroPrint("println", null, new Object[]{ "String event: ", jcc.createVarAccess("e") }); jcc.endMethod(); // Create method: // public void event(int e) { // %println "int event: ", e // } jcc.startMethod(Modifier.PUBLIC, "event", new String[]{ "int" }, paramNames, "void", null); jcc. macroPrint("println", null, new Object[]{ "int event: ", jcc.createVarAccess("e") }); jcc.endMethod(); // Create method: // public void event(int[] e) { // int i = 0; // %if e != null // aload e // arraylength // istore i // %end_if // // %println "Array event: length=", i // // %if i > 0 // %array_iterate e i // %println "e[", i, "]=", e[i] // %end_array_iterate // %end_if // } jcc.startMethod(Modifier.PUBLIC, "event", new String[]{ "int[]" }, paramNames, "void", null); jcc. addLocalVariable("i", "int", new Integer(0)); jcc. macroIf("!=", jcc.createVarAccess("e"), null, "?if_1", false); jcc. inst_aload("e"); jcc. inst_arraylength(); jcc. inst_istore("i"); jcc. macroEndIf("?if_1"); jcc. macroPrint("println", null, new Object[]{ "Array event: length=", jcc.createVarAccess("i") }); jcc. macroIf(">", new JavaClassCreator.VarAccess("i"), new Integer(0), "?if_2", false); jcc. macroArrayIterate(jcc.createVarAccess("e"), "i", "?iter_1"); jcc. macroPrint("println", null, new Object[]{ " e[", jcc.createVarAccess("i"), "]=", jcc.createArrayAccess("e", jcc.createVarAccess("i")) }); jcc. macroEndArrayIterate("i", "?iter_1"); jcc. macroEndIf("?if_2"); jcc.endMethod(); byte[] bytes = jcc.endClass(); return classLoader.defineClass("DynaHandler", bytes, 0, bytes.length); } } All class names must be full Java class names, and no imports or shortcuts exist. Symbolic constant names are not needed because you can use Java static final values for this purpose. A parameter list must be in Object[], and flow control macros take a unique identifier (in contrast, Jamaica automatically handles that.)The close correspondence between Jamaica and the JavaClassCreator API is no accident: Jamaica uses this very interface to create classes! Since Jamaica adopts a visitor-based architecture, implementing another visitor that takes a Jamaica source file and generating a Java classfile that calls the JavaClassCreator API is extremely easy, when such a need arises.ConclusionJamaica is an abstract assembly language for the JVM. The JVM defines a sophisticated class structure, including a bytecode instruction set; structurally, such classes are identical to Java classes (inheritance, methods, fields, etc.) Exposing class structure details to assembly programmers is neither desired nor needed (because you can hardly add any variations to the class structure). Jamaica uses the Java language syntax to define those class structures and uses symbolic names in instructions so JVM assembly programmers can focus on the business—bytecode programming—and be totally shielded from the unnecessary complexity underneath.Traditionally, JVM bytecode programs are long, ugly, and hard to read. Jamaica introduces numerous macros for commonly used patterns, from conveniences such as printing, loading, and string concatenation, to automatic data type conversion, to flow controls such as if-else and collection/array iterations. These macros can take recursive array-access expressions, which is a big line-saver for array access. The language has other conveniences as well. JVM assembly programs written in Jamaica are intuitive, readable, and maintainable.Strictly speaking, a “macro assembler” should allow users to define their own macros, which Jamaica doesn’t support. User macros would make Jamaica significantly more complex and would probably introduce many unobvious concepts and constructs. Bytecode programming is only complementary to Java (as Java the language is already so close to the underlying JVM); such complexity is not warranted. When more patterns emerge, just defining more Jamaica built-in macros in its revisions would prove a better approach.The Jamaica language currently does not support inner class definitions (using inner classes is supported). Local variables lack scope, so variable slots cannot be reused as the JVM Specification allows. Jamaica is a strongly typed language, so each slot is designated a specific type (otherwise macros may not operate correctly). These are deliberate limitations to the JVM Specification, which may or may not change in the future.As of this writing, the Jamaica language software supports class generation one class at a time. Since Jamaica depends on textual information much more than runtime classes, generating multiple classes is not a problem even if they have interdependencies. The only situation would be that both classes use each other’s static final values, because the Jamaica compiler would not be able to get constants in one class while compiling the other. This, however, is in the realm of software tools and is separate from Jamaica the language.Jamaica programs are ultimately mapped to calls of an abstract Java class creation API, JavaClassCreator. Like Jamaica, this abstract API hides much of the JVM complexity that Java programmers do not necessarily care about nor are able to meddle with. It focuses on class generation, implements all the Jamaica macros, and delegates Java class creation (including bytecode instructions) to its implementation. Currently it includes built-in implementations using ASM and Jakarta BCEL. You can use Jamaica to quickly experiment with dynamically creating classes (a risky and arduous process itself), then mechanically convert the Jamaica source code into JavaClassCreator API calls.James Jianbo Huang is the author of the Jamaica language. He has also authored JudoScript, an intuitive and powerful Java scripting and functional language. Huang holds an MS in electronics, and favors creating software and solutions; many of the solutions he has created over the years are now features in JudoScript. He enjoys music and sports but does not practice judo. JavaSoftware Development