Learn DSL concepts and where they're used in real-world programming If you’ve ever written a makefile or designed a Web page with CSS, you’ve already encountered a DSL, or domain-specific language. DSLs are small, expressive programming languages custom designed for specific tasks. In this four-part series, Venkat Subramaniam introduces the concept of DSLs and eventually shows you how to build them using Java. In this first article, Venkat explains what a DSL is and defines the difference between an external DSL and an internal one. He then points out some DSLs you’ve likely been using for years, perhaps without even realizing it. If you have been involved in writing or even just using applications, chances are that you’ve already encountered domain-specific languages, or DSLs — even if you didn’t realize it at the time. A keyword input file to an application that receives input data is a DSL. A configuration file is a DSL. A makefile is a DSL used to specify rules and dependencies for building an application. If you’ve written any of these, you’ve already taken your first steps to creating domain-specific languages. The word language in the phrase may lead you to expect a DSL would use syntax to express certain semantics. Unlike a general-purpose language like Java, a DSL is fairly limited in scope and capabilities; as the name suggests, DSLs are keenly focused on a certain type of a problem or domain, and on expressing a narrow set of solutions within the context of that limited scope. And that’s a good thing — DSLs are simple and concise. Okay, that’s L; what about D and S? The word domain in DSL refers to “an area or sphere of knowledge, influence, or activity.” (For more information, refer to Domain-Driven Design by Eric Evans.) Focusing on a domain gives you a context — a logical framework within which you can evolve models for an application. The word specific in DSL gives you the bounded context. It helps you keep things relevant, focused, terse, and expressive. Simplicity is critical to the success of a DSL. A person familiar with the language’s domain must easily understand it. For example, if you’re creating a DSL that actuaries will use to express business rules in the domain of insurance, you don’t want them to spend a lot of time learning a difficult and complicated language. You want them to focus on expressing the details associated with insurance risks in a way that they can easily understand, discuss, evolve, and maintain. The DSL you create for them must be built on their vocabulary, the terms they use every day to communicate with their peers. You want them to use the syntax you provide, but it should seem to them that they’re merely specifying some discrete rules. And they should be able to do so without getting the impression that they are really programming or even using some kind of a language. Creating a good DSL is like cooking a nutritious meal; just like you want kids to eat vegetables without realizing and fussing over them, you want clients to use your DSL without worrying about its syntax. Conciseness is another part of writing a good DSL, which means choosing syntax that is both terse and expressive. Terseness within reason makes your code easier to read and maintain. Expressiveness helps to promote communication, understanding, and speed. For instance, for someone who understands matrix multiplication, matrixA.multiply(matrixB); is less expressive and concise than matrixA * matrixB. The former involves calling functions and using parentheses, and includes an intimidating semicolon. The latter is already an expression that will be quite familiar. Why use a DSL? Recently, I was stuck in an airport with some reservation problems. As a gate agent navigated the airline’s computer system using the graphical menus, my hopes of making the flight diminished. Then an expert agent showed up, and she knew just what commands to use. She got behind the menus and started interacting with a text-driven interface that was rather terse, but gave her more commands to work with, and ultimately more control of the application. In no time she got me on my way. A GUI makes an average user productive, but can slow down an expert user or a user very fluent with your application and its domain. You don’t want to make the novice productive and slow down an expert in the process. I am not discouraging the use of GUIs; I am merely pointing out that the GUI is not always the most productive alternative. The future of DSLs Martin Fowler argues that applications will eventually come to use several small and limited DSLs instead of one big general-purpose language. Ola Bini argues that “three language layers will emerge in large applications”: A small, stable layer on which the rest of the application is built; a substantial dynamic layer where most functionality lies; and a third domain layer, built using DSLs. Richard Pawson and Robert Matthews introduced the concept and framework that they called Naked Objects, in which they auto-built a thin object-oriented user interface (OOUI) from a domain model. The OOUI exposed the behavior of domain objects so that users could interact with them directly. This type of user interface is well-suited for applications that organizations use internally. Such a UI is a tad crufty but the simplicity means you can create it more quickly. And, with proper training, users can access all of the application’s capabilities, because the underlying domain model is exposed. The goal of DSLs is similar — to provide a highly effective interface that allows users to interact with your application. The interface can be graphical or textual. A DSL is highly expressive, simple, and concise at the same time. This can help the user of an application be more productive. A DSL is designed to be very intuitive and fluent for a domain expert to use (more about this in the second article in this series). It is designed with the convenience and productivity of users — the domain experts, within the context of the domain — in mind. External and internal DSLs A DSL may be classified as external or internal, depending on how it is designed and implemented. An external or free-standing DSL is designed to be independent of any particular language. As an author of such a DSL, you pretty much decide on the syntax. You are on your own when it comes to defining the grammar and parsing the syntax. You can use any language and tools to implement your DSL — you could use Java and ANTLR, for instance. You have the complete flexibility to choose the syntax — both the pleasure and pain of developing and implementing the language are all yours. (Speaking from experience, I had the “immense pleasure” of working with lex and yacc to maintain a grammar for a keyword input file — an external DSL — in a project I worked on years ago.) An internal or embedded DSL, on the other hand, is designed and implemented using a host language. The good news is you don’t have to worry about grammar, parsers, and tools to do the heavy lifting. However, you are constrained by the host language, and your DSL is influenced by its host’s flexibility, limitations, and idiosyncrasies. The challenge with an internal DSL is to tactfully design the language so that the syntax is within the confines of what the host language allows, yet is as expressive, concise, and fluent as you desire. The two different kinds of DSLs both have pluses and minuses. An external DSL gives you the liberty to design the syntax of your language in exactly the way you like. You can select the language’s symbols, operators, constructs, and structure as you please to fit your domain. On the downside, you have to define the grammar for your language. You also have to create a compiler to parse and process the syntax and map it to the semantics you expect. An external DSL gives you a lot of flexibility, but you have to take the time to do the hard work of compiling it. An internal DSL rides on the syntax of a host language, so you don’t spend any time or effort worrying about compiling or parsing. You do need to spend a significant amount of time and effort designing the syntax of your DSL, however, as you are largely constrained by the host language. You want to chose a host language that is highly flexible and has as few restrictions and idiosyncrasies as possible. To effectively implement an internal DSL, you have to exploit the metaprogramming capabilities of your host language. Today, external DSLs give you better control than internal DSLs when validating DSL syntax. Because you take the effort to define the grammar for your external DSL, that effort also serves to validate the syntax. This is harder to do with an internal DSL because the code is often processed dynamically. You will have to do extensive error checking and do the validation yourself. Examples of DSLs in everyday programming As I noted at the beginning of this discussion, DSLs are very common. Chances are good that you’ve already used quite a few of them as both a programmer and a user. One nearly omnipresent example of a DSL is Cascading Style Sheets (CSS), which allows you to add style to Web pages and documents. Listing 1 is an excerpt from a CSS file. It specifies the style for a hyperlink (the <A> tag) — namely how its color should change when you mouse over it. Listing 1. CSS is a DSLA:hover { color: #FF0000; text-decoration: none; } Another example of an external DSL is the stuff you write in a makefile for the make utility. The domain in this case is build — the source code that you want to compile and build into a library or executable, or the files that you want to process for generating documentation or the like. Listing 2 contains an example makefile. Listing 2. So is this makefileGCC=cxx -I. all : clean compile compile : myprog myprog: MyProg.cc Util.o $(GCC) -o myprog MyProg.cc Util.o Util.o : Util.h Util.cc $(GCC) -c Util.cc clean : /bin/rm -f myprog Util.o The makefile expresses a set of dependencies, and the commands (the indented statements) are executed based on the dependencies. For example, Util.cc is compiled into Util.o if Util.o does not exist, or if Util.h or Util.cc is modified after Util.o was created. Once you learn a few rules (and idiosyncrasies), the makefile is a pretty lightweight way to express build dependencies. The Java equivalent of make is the popular Ant build file, an example of which you can see in Listing 3. Listing 3. An Ant build file<project name="AnExampleProject" default="jarit" basedir="."> <property name="src" location="src"/> <property name="build" location="build"/> <property name="distrib" location="distrib"/> <target name="compile" description="compile your Java code from src into build" > <javac srcdir="${src}" destdir="${build}"/> </target> <target name="jarit" depends="compile" description="jar it up" > <jar jarfile="${distrib}/AnExampleProject.jar" basedir="${build}"/> </target> </project> Both makefiles and Ant build files are external DSLs. In the case of Ant, the XML representation is processed by the ant utility using an XML parser. Ant’s vocabulary contains various terms, such as target and properties, that are valid in the domain and context of compiling and bundling code. In the Ruby community, the equivalent of make and Ant is Rake. Rake is also an example of a DSL; however, it is written using Ruby itself, so it is an internal or embedded DSL. Listing 4 contains an example of a Rake file. Listing 4. An example Rake fileORIGINAL = 'input.dat' BACKUP = 'input.dat.bak' task :default => BACKUP file BACKUP => ORIGINAL do |task| cp task.prerequisites[0], task.name end In this simple example, the file input.dat is copied (or backed up) to input.dat.bak only if the timestamp of the dat file is later than the bak file, or if the bak file does not exist. Rake skillfully takes advantage of the flexibility of Ruby to provide an elegant syntax for expressing tasks and their dependencies. For instance, in the context of Rake task is simply a method that takes a hash as a parameter. Gant is an internal DSL that is similar to Rake; it is written using Groovy, and serves as a wrapper around Ant. If you use Gant, you don’t have to endure XML coding, and can use a lightweight syntax to express your builds. Listing 5 is an example. Listing 5. A sample Gant file#slightly modified version of example from http://gant.codehaus.org/ includeTargets << gant.targets.Clean cleanPattern << [ '**/*~' , '**/*.bak' ] cleanDirectory << 'build' target (stuff : 'A target to do some stuff') { println 'Stuff' depends clean echo message : 'A default message from Ant' otherStuff() } target (otherStuff : 'A target to do some other stuff') { println 'OtherStuff' echo message : 'Another message from Ant' clean() } setDefaultTarget stuff Using DSLs for validation More examples of internal DSLs can be found in Rails and Grails. Listing 6 is an example of validation in ActiveRecords in Rails. Listing 6. Validation in Railsclass Person validates_presence_of :first_name, :last_name, :ssn validates_uniqueness_of :ssn end I find the validation syntax to be very easy to read and understand. It is highly expressive; once you get the hang of it, adding and modifying validation logic is a breeze. Similarly, GORM in Grails supports constraints for validation, as shown in Listing 7. Listing 7. Validation in Grailsclass State { String twoLetterCode static constrains = { twoLetterCode size: 2..2, blank: false, unique: true } } The examples you have seen so far are for programmers (with the possible exception of CSS, which is often used by Web designers as well). But DSLs are not exclusively tailored for that audience; they can be deployed by anyone who uses an application. For instance, easyb is a behavior-driven testing framework that lets users write stories that express and validate application behavior. Validation in easyb is based on a story DSL, which bridges the gap between developers and business stakeholders. Listing 8 is an example of a story written in version 0.8 of easyb. easyb’s story DSL takes advantage of Groovy’s flexible syntax and metaprogramming capabilities. The vocabulary terms, like given, when, and the like, are simply methods that accept a String parameter and a closure. When you exercise the story, easyb will execute the closure, if provided, letting it interact with the underlying application and assert its behavior. Listing 8. A story in easyb//transferMoney.story scenario 'transfer money', { given 'account numbers 123456789 and 123456788' when 'transfer $50 from 123456789 to 123456788' then 'balance of 123456789 is $50 less' and then 'balance of 123456788 is $50 more' and then 'transaction has been logged...' } The code in Listing 8 is executable documentation, and a domain expert or business analyst could write it. Later, a business analyst or a programmer could integrate it with the application by filing in glue code to calls to the service layer of the application. You can execute the story in Listing 8 using the following easyb command: java -classpath ... org.disco.easyb.BehaviorRunner transferMoney.story In which case you will get the following output: Running transfer money story (transferMoney.story) Scenarios run: 1, Failures: 0, Pending: 1, Time Elapsed: 0.431 sec 1 behavior run with no failures easyb processed the story and indicated a status of Pending because the code has not been integrated with your application. What I like about easyb is its lightweight, expressive syntax, which can be easily understood by programmers, testers, managers, domain experts, and business analysts. The DSL syntax of easyb provides an ubiquitous language for communicating the application’s domain knowledge and requirements. In conclusion Once you wear your DSL glasses, you will start finding countless examples of these specialized languages around you. You’ve seen a few examples of them in this article. In the next article in this series, you will learn some of the characteristics of a DSL. That will provide you with a foundation for creating DSLs of your own, which we’ll practice in the latter half of this series. For now, just try to notice the DSLs all around you. When you do encounter a DSL, explore it to see how it is tailored for its particular domain. Dr. Venkat Subramaniam has trained and mentored thousands of software developers in the U.S., Canada, Europe, and Asia. He helps his clients succeed with agile development. He’s author of the book .NET Gotchas (O’Reilly), and coauthor of the 2007 Jolt productivity award-winning book Practices of an Agile Developer (Pragmatic Bookshelf). Venkat is a frequently invited speaker at international conferences. His latest book is Programming Groovy: Dynamic Productivity for the Java Developer (Pragmatic Bookshelf). JavaDevelopment ToolsSoftware Development