by Venkat Subramaniam

Creating DSLs in Java, Part 3: Internal and external DSLs

how-to
Aug 19, 200820 mins

Parse and refine an external DSL with openArchitectureWare's Xtext

You understand the basics of domain-specific languages and now you’re ready to begin creating and refining them for your projects. In this third article in his series Venkat Subramaniam shows you how to create both internal and external DSLs using Java code. He explains the difference between the two types of DSL and why Java is a better choice for creating one type than the other. He also introduces the various options for parsing external DSLs — as plain text, as XML, or using an industrial-strength language recognition tool such as ANTLR or openArchitectureWare.

Using semicolons or a pair of parentheses in code is second nature to those of us who are accustomed to C-derived languages. Our pinkies place semicolons involuntarily, and our eyes easily gloss over them. In DSLs, however, the accumulation of parentheses and semicolons is just so much noise, having a negative affect on fluency.

One way to decrease the noise level in your DSLs is trade in your semicolons for dots (.) and using method chaining. In addition to making your code more fluent, method chaining also helps with context, by removing the need to repeat an object reference. The JSON example in Part 2 of this series illustrates the value of method chaining. (See “Creating DSLs in Java, Part 2,” Listing 9). You can also use the static import feature in Java 5 to eliminate object or class references. The EasyMock example in “Creating DSLs in Java, Part 2,” Listing 5, illustrates this.

Internal vs external DSLs

Internal DSLs ride on a host language, so an internal DSL’s syntax is both influenced and restricted by the host language. External DSLs can be built from the ground up but doing so requires mental muscle, as well as a very good parser.

In this article we’ll see what happens when we apply these techniques to create a fluent, context-aware interface using Java code. The first DSL we’ll build is an internal DSL written using the Java language syntax.

Internal DSLs in Java

Let’s assume you want to send out some emails. You need a class that will simply allow you to specify the from, to, subject, and message data. You don’t want to bother about object state or even create objects. You can start out with a traditional Java API. Study the API first, then we’ll refactor it for more fluency. (Only the interface is shown because the implementation details aren’t relevant to the example.)

Listing 1. Mailer: A traditional Java API

package dsl.example;
public class Mailer
{
  public void from(String fromAddress) {}
  public void to(String toAddress) {}
  public void subject(String theSubject) {}
  public void message(String body) {}
  public void send() {}
}

To use this class, you would write something like this:

Listing 2. Creating a new Mailer and sending mail

Mailer mailer = new Mailer();
mailer.from("build@example.com");
mailer.to("example@example.com");
mailer.subject("build notification");
mailer.message("some details about build status");
mailer.send();

A more fluent Mailer

So far there are still quite a few steps between you and sending mail. That may be fine for a programmer, but not for a DSL user. What can we do to make this API fluent, and more like a DSL?

First, each of the methods of Mailer, except the send() method, can return itself, so we can chain the calls, as shown in Listing 3.

Listing 3. Method chaining in the Mailer API

package dsl.example;
public class Mailer
{
  public Mailer from(String fromAddress) { return this; }
  public Mailer to(String toAddress) { return this; }
  public Mailer subject(String theSubject) { return this; }
  public Mailer message(String body) { return this; }
  public void send() {}
}

You would use this modified version as shown in Listing 4.

Listing 4. Sending mail with a new Mailer

new Mailer()
  .from("build@example.com")
  .to("example@example.com")
  .subject("build notification")
  .message("some details about build status")
  .send();

That’s a notch better; however, it would be nice to eliminate the use of new. After all, as a user of this DSL your main interest is in sending out emails, not in creating objects. We want to eliminate as much verbosity from the DSL as possible.

Listing 5. Mailer DSL, third iteration

package dsl.example;
public class Mailer
{
  public static Mailer mail() 
  { 
    return new Mailer();
  }
  public Mailer from(String fromAddress) { return this; }
  public Mailer to(String toAddress) { return this; }
  public Mailer subject(String theSubject) { return this; }
  public Mailer message(String body) { return this; }
  public void send() {}
}

Here’s how the above DSL works:

Listing 6. Mailer sends mail!

Mailer.mail()
  .from("build@example.com")
  .to("example@example.com")
  .subject("build notification")
  .message("some details about build status")
  .send();

If you’re using Java 5, you can perform a static import of the mail method, like so:

import static dsl.example.Mailer.mail;

Once you add the static import, you don’t need the class reference to call the mail method (assuming there is no collision with any other mail method). The code distills to the following:

Listing 7. Mailer: A fluent, context aware DSL

mail()
  .from("build@example.com")
  .to("example@example.com")
  .subject("build notification")
  .message("some details about build status")
  .send();

The three-times refactored DSL in Listing 7 shows exactly how much fluency you can realize using Java code today. You are stuck with the parentheses and the ending semicolon. You can go a lot further with dynamic languages like JRuby and Groovy — and not (as you might think) because these languages are typeless or have relaxed typing. On the contrary, it is the relaxed syntax of these languages that makes them more fluent. The syntax of dynamic and more modern languages requires less ceremony. I’ll show you what I mean by that in the final article in this series.

External DSLs in Java

External DSLs aren’t based on a host language, so their syntax is not restricted like the syntax of internal DSLs. You can realize high fluency, less clutter, and appropriate context when writing an external DSL. However, you will have to parse the DSL. For this you can choose to parse plain old text, parse XML, or use tools to facilitate parsing. I’ll discuss these options in the next sections.

Parsing plain old text

You can capture DSL input from a file or other source into a String using the Java API. Once you have a String form, you can parse it using regular expressions (using the classes in the java.util.regex package). You can then execute the DSL within the context of a processing class. You can verify whether the syntax or commands presented in the DSL input are valid methods or properties using Java reflection, and then invoke them dynamically, also using reflection. Using certain design patterns like the Command pattern will ease the actual coding required to implement a plain-text-based DSL.

The advantage of this approach is that it uses plain old Java. The effort involved is an obvious disadvantage, however. Furthermore, you can’t easily verify the syntax of your DSL input until you start processing it. This approach also becomes tedious once your DSL syntax evolves beyond a very small, simple format.

Parsing plain old XML

Another option is to parse your DSL using the same tools you would use to parse an XML file. Here you would validate the syntax of the XML using a DTD, XMLSchema, RELAX NG, or some other tool. You could then use various XML parsers and APIs (like DOM and SAX) to parse the content of the document and take actions.

This approach allows you to rely on the solid tools and facilities associated with XML processing. From the point of view of processing and parsing a DSL, this approach makes your life relatively easy.

Processing your DSL this way does nothing to increase or preserve its fluency, however. The resulting DSL will have the pointy syntax and clutter that come with XML. (James Duncan Davidson, the creator of Ant, has commented on the use of XML in Ant: “XML format becomes more and more burdensome to actually getting something done.”) One of the core design requirements of a DSL is that it should be easy to use. In most cases, using an XML parser damages the fluency of the DSL and makes it harder to use. If you’re serious about writing fluent, useable DSLs, you’ll need a better option than the XML parser.

Industrial strength parsing

Realistically, if you decide not to parse your DSL as XML, you will have to use an industrial strength parser.

One option in this area is ANTLR (Another Tool for Language Recognition). ANTLR is a powerful parser generator. You can use it to define your grammar and generate a parser for your DSL. If you have a lot of heavy lifting to do, then ANTLR can help you a great deal. There is a learning curve to use it and you have to work with a number of Java classes for code generation and parsing.

Another tool that is currently in development is MPS or Meta Programming System. It is being developed by JetBrains (the people behind IntelliJ IDEA) and is expected to be open source.

In this article I will show you how to parse your DSL using the tool openArchitectureWare (OAW). This tool provides a parser, code generator, interpreter, and the tools/facilities necessary to create editors for your DSL.

Parsing your DSL with Xtext

Using OAW’s Xtext you can define your grammar and specify rules and constraints. If you’re interested in creating a code generator, you can use the OAW extension tools. You can generate code by mapping your DSL into Java code or some other language. Alternately, you can interpret your DSL, meaning you can read it in and take actions dynamically.

Code generation is good if you are expressing certain structure or rules in your DSL, which need to be transformed into certain structures or rules in your program. If your objective is not to express structure, but to express behavior or actions to be performed, interpretation is a better option. Interpretation allows you to parse in your DSL input, navigate though the input (like you would using a DOM parser on an XML file), and take actions based on the semantics the input maps to.

I’ll focus on interpretation. Before we get to the DSL to be interpreted, you must have the OAW tools installed and set up. Assuming you have the latest version of the Eclipse IDE, point it to https://www.openarchitectureware.org/updatesite/milestone/site.xml to update the OAW plugin for Eclipse. Ensure that the plugin installation completes successfully before you proceed.

The OAW Workflow API

A general interpretation API is currently being developed for OAW. When it is done, I am hoping it will provide a SAX-like facility where you can receive events as different parts of the DSL are parsed. In the meantime, if you’re interested in parsing, you can navigate the structure of your DSL model using the OAW Workflow API, as I’ll demonstrate.

Let’s start with a very simple example of a DSL to be parsed. I have intentionally kept the DSL trivial so that we can focus on the DSL’s grammar and on how the OAW Workflow API works.

Listing 8. A simple DSL, waiting to be parsed

players James John Jake
James 12
John 14
Jake 9

This looks like a simple data input file, and it is. I list the names of players on the first line, and for each player I have their name and score on separate lines. We’ll spend the remainder of the article working on this DSL. You’ll learn how to define a grammar for this DSL so that you can list game scores, how to write constraints, how to use the editor, and finally how to interpret the DSL.

Creating a DSL grammar

Start Eclipse and create a new project by clicking on File -> New -> Other. Select “Xtext Project” under OpenArchitectureWare. Click on Next and fill in the details for the project as shown in Figure 1.

Figure 1. Project detail (click to enlarge)

Once you click Finish, the Eclipse Package Explorer should show two projects, game.dsl and game.dsl.editor, as in Figure 2.

In the game.dsl project, double-click on the gamedsl.xtxt file and modify it as shown in Listing 9. (Note that I have left the two auto-generated comment lines at the top.)

Listing 9. Modified gamedsl.xtxt

//specify your DSL grammar rules here ...
//IMPORTANT: You should change the property 'overwrite.pluginresources=true' in the properties file to 'overwrite.pluginresources=false' AFTER first generation

Model:
  PlayersList
  (scores+=Score)+;

PlayersList:
 "players" (names+=Name)+;

Name:
  name=ID;

Score:
 name=ID score=INT;

Listing 9 defines a grammar for this DSL. The file contains rules defining how the parser will reduce tokens as it parses the DSL file. The DSL model contains two things, a PlayersList and a collection of scores. scores is a variable in the model that holds a collection of Score items. Variables like scores will be used later when we specify constraints and write the code to interpret the DSL.

The syntax (scores+=Score)+ says that items of Score will be added (and so +=) to the scores collection in the model. The “+” says that you expect to see at least one score. I further define PlayersList to contain a static keyword “players” followed by a collection of names.

The collection of names will be made up of one or more name values, where each name follows the syntax as constrained by ID. A value constrained by ID is allowed to start with either an underscore or any alphabet, followed by any number of alphabets, digits, or underscores.

Each Score will contain a name and a score value, which is constrained to be an integer type. Refer to the Xtext Reference Document in Resources for more details on writing grammar in Xtext.

Using the editor to create DSL input

Let’s take this grammar for a ride before we get down to writing constraints. Right-click on the grammar file gamedsl.xtxt and select “Generate Xtext Artifacts,” as shown below.

Figure 3. Generate Xtext artifacts (click to enlarge)

Wait for the workflow generator to run and make sure no errors are reported in the console. Remember to set the overwrite.pluginresources=false in the file generate.properties, as instructed by the comment in the xtxt file.

Now, select Debug Configurations from the toolbar, as shown below.

Figure 4. Debug configurations (click to enlarge)

Right-click on Eclipse Application, select New (see below), and click on Debug.

Figure 5. Debugging (click to enlarge)

Wait for a new instance of Eclipse to pop up and click on New -> File -> Other to create a project. Select “gamedsl Project” from under the Xtext DSL Wizard, as shown below.

Figure 6. Select gamedsl Project (Click to enlarge.)

Click on Next and then Finish to create a project. Open the model.dsl file and modify it as shown here:

players James John Jim

As soon as you type the above text into model.dsl the editor will report an error, as shown below.

Figure 7. The editor reports an error (click to enlarge)

The above error tells you that you’ve not yet provided the scores for each player. What you see in Figure 7 is an editor for your DSL, which was automatically generated as you wrote your grammar.

Modify the model.dsl file as shown below:

players James John Jake                                       
James 12
James 14
Jim 9                                                

Now no errors are reported. In fact, the DSL file contains three undetected errors: it has a duplicate entry for James, has an entry for the unlisted player Jim, and is missing a score for John. Our next step is to write constraints, which will enable the editor to point out these errors visually. The constraints file will also be used when parsing the DSL file.

Writing constraints

Back in the game.dsl project, double-click on the Checks.chk file as shown below.

The first thing we’ll do is write a constraint that the Score should not be duplicated for a player. So modify the Checks.chk file as follows:

Listing 10. Modifying the constraints file

import gamedsl;

extension game::dsl::Extensions;
/*
 * This checks file is used by the parser
 * and by the editor. Add your constraints here!
 */

context Score ERROR "Duplicate Score for " + name :
   allElements().typeSelect(Score).select(e | e.name == name).size == 1;

The context command takes the element name followed by an error message. The expression that follows the colon (:) needs to return true if there are no errors. In the expression we are traversing all the elements in the DSL document, fetching those of type Score, and picking those that have the same name as the name of the current Score element. Finally, we’re making sure only one exists for each name.

Now, bring up the second instance of Eclipse and go to the model.dsl file. You will notice the following error reporting:

Figure 9. Another error (click to enlarge)

Modify model.dsl as shown below:

players James John Jake
James 12
Jim 9                                                     

Now the file contains two undetected errors. Let’s modify the Checks.chk file to address the unlisted-player error next:

Listing 11. Adding a constraint

import gamedsl;

extension game::dsl::Extensions;
/*
 * This check file is used by the parser
 * and by the editor. Add your constraints here!
 */

context Score ERROR "Duplicate Score for " + name :
   allElements().typeSelect(Score).select(e | e.name == name).size == 1;
   
context Score ERROR "Given name " + name + " not in players list " + allElements().typeSelect(PlayersList).names.name:
  allElements().typeSelect(PlayersList).names.name.contains(name);

The new constraint will enable Score to check if the name given is in the player’s list.

Now bring up the second instance of Eclipse, go to model.dsl, and view the file.

Figure 10. Yet another error! (click to enlarge)

We need to modify the model.dsl file to get rid of the following error:

players James John Jake
James 12
Jake 9                                                     

In Listing 12 we add one last constraint to Checks.chk, so that Score can check for a missing player score.

Listing 12. One last constraint

import gamedsl;

extension game::dsl::Extensions;
/*
 * This check file is used by the parser
 * and by the editor. Add your constraints here!
 */

context Score ERROR "Duplicate Score for " + name :
   allElements().typeSelect(Score).select(e | e.name == name).size == 1;
   
context Score ERROR "Given name " + name + " not in players list " + allElements().typeSelect(PlayersList).names.name:
  allElements().typeSelect(PlayersList).names.name.contains(name);
  
context PlayersList ERROR "Missing score for name(s) in " + names.name :
   allElements().typeSelect(PlayersList).names.name.forAll(e |
     allElements().typeSelect(Score).name.contains(e));

The error reported in the model.dsl is shown in Figure 11.

Figure 11. Missing score error (click to enlarge)

And here is the model.dsl file after we’ve fixed this last error:

players James John Jake
James 12
John 14
Jake 9

Interpreting the DSL

You’ve seen now how to write a grammar, how to use the OAW editor, and how to write constraints to ensure the editor catches mistakes. Now let’s shift gears and see how to interpret a DSL input file. Take a look at the structure of the model representing your DSL by viewing the file gamedsl.score in the scr-gen directory of game.dsl. We have to navigate this hierarchy to traverse through the document and take actions based on the input DSL.

In the game.dsl project, open the gamedslproject.oaw file as shown in Figure 12.

Figure 12. Open the gamedslproject.oaw file (click to enlarge)

Modify the gamedslproject.oaw file as shown in Listing 13.

Listing 13. Modified gamedslproject.oaw

Listing 13. Modified gamedslproject.oaw

<workflow>

  <bean class="game.dsl.MetaModelRegistration" />
  
  <cartridge
    file="game/dsl/parser/Parser.oaw"
    outputSlot="modelSlot"
    modelFile="model.dsl">
    <metaModel id="metamodel" class="org.eclipse.m2t.type.emf.EmfRegistryMetaModel" /> 
  </cartridge>

   <component 
     class="game.dsl.PrintWinner"
     modelSlotHandle="modelSlot" />
</workflow>

Two things are going on there. One is the parsing of the model.dsl file. The cartridge element indicates which parser to use, in which property to store the in-memory model created (modelSlot), and the file to parse (model.dsl). The component tells which class to run and indicates where to send the property of the class (modelSlotHandle).

Create a file named PrintWinner in the game.dsl project as shown below:

PrintWinner in the game.dsl project.
Figure 13. PrintWinner (click to enlarge)

Modify the PrintWinner.java file to process the DSL input and print the winner as shown below:

Listing 14. PrintWinner.java

package game.dsl;

import java.util.Collection;
import java.util.HashMap;

import org.eclipse.emf.ecore.EObject;
import org.openarchitectureware.workflow.WorkflowContext;
import org.openarchitectureware.workflow.issues.Issues;
import org.openarchitectureware.workflow.lib.AbstractWorkflowComponent;
import org.openarchitectureware.workflow.monitor.ProgressMonitor;

public class PrintWinner extends AbstractWorkflowComponent 
{
    private String modelSlotHandle;
    
    public void setModelSlotHandle(String theModelSlotHandle)
    {
        modelSlotHandle = theModelSlotHandle;
    }

    public void checkConfiguration(Issues issues) 
    {
        if (modelSlotHandle == null)
        {
            issues.addError(this, "ModelSlot is not set");
        }
    }

    public void invoke(WorkflowContext context, ProgressMonitor monitor, Issues issues) 
    {
        EObject model = (EObject) context.get(modelSlotHandle);

        Collection playerNames = (Collection) model.eGet(model.eClass().getEStructuralFeature("names"));
        
        HashMap playersAndScores = new HashMap();
        
        for(Object name : playerNames)
        {
          EObject playerName = (EObject) name;
          playersAndScores.put(playerName.eGet(playerName.eClass().getEStructuralFeature("name")), 0);
        }

        Collection scores = (Collection) model.eGet(model.eClass().getEStructuralFeature("scores"));
        for(Object score : scores)
        {
          EObject aPlayerAndScore = (EObject) score;
          
          String playerName = (String) aPlayerAndScore.eGet(aPlayerAndScore.eClass().getEStructuralFeature("name"));
          int theScore = (Integer) aPlayerAndScore.eGet(aPlayerAndScore.eClass().getEStructuralFeature("score"));
          
          playersAndScores.put(playerName, theScore);
        }
        
        int max = Integer.MIN_VALUE;
        String winner = "";
        for(Object name : playersAndScores.keySet())
        {
          int score = (Integer)(playersAndScores.get(name));
          if (score > max)
          {
             max = score;
             winner = (String) name;
          }
        }
        
        System.out.println("Winner is " + winner + " with a score of " + max);
    }
}

The setModelSlotHandle() will be called to set the name of the in-memory model that the parser creates. You can access the content of the document using this. The checkConfiguration() method is a good place to report any errors or issues you may find before processing the model. You can walk through the model within the invoke method using the API provided by the org.eclipse.emf.ecore.EObject class.

In the invoke method I first obtain the names of the players and store them in a HashMap, using the player name as key and 0 for score value. Then, for each score in the scores collection, I update the HashMap with the score. When the input has been processed, I loop through the HashMap to find the winner and winning score.

Right-click on the gamedslproject.oaw file and select “Run As oAW Workflow.” You will see the output in the console reporting that John is the winner with a score of 14.

If you would like to run the workflow from within your application code, you can do that using the WorkflowRunner, as shown in Listing 15.

Listing 15. WorkflowRunner

String workflowFile = "gamedslproject.oaw";
HashMap properties = new HashMap();
HashMap slotContents = new HashMap();

new WorkflowRunner().run(
      workflowFile, null,
      properties, slotContents);

In conclusion

In this article you’ve seen hands-on examples for creating both internal and external DSLs in Java. The Java syntax does not so easily lend itself to the level of fluency required for a DSL, and so creating internal DSLs using Java is challenging. The Java platform’s tool support for creating external DSLs is pretty strong, however. You’ve walked through the steps of building an external DSL using openArchitectureWare and Xtext. In the final article in this series you’ll have the opportunity to create an internal DSL on the Java platform, but using Groovy rather than Java code.

Dr. Venkat Subramaniam has trained and mentored thousands of software developers in the U.S., Canada, Europe, and Asia. He helps his clients succeed with agile development. He’s author of the book .NET Gotchas (O’Reilly), and co-author of the 2007 Jolt productivity award-winning book Practices of an Agile Developer (Pragmatic Bookshelf). Venkat is a frequently invited speaker at international conferences. His latest book is Programming Groovy: Dynamic Productivity for the Java Developer (Pragmatic Bookshelf).