Introduction to scripting in Java, Part 2

Find out what else you need to know about scripting in this second half of the JavaWorld excerpt from Dejan Bosanac's Scripting in Java: Languages, Frameworks, and Patterns (Addison Wesley Professional, August 2007)

Excerpt from Scripting in Java: Languages, Frameworks, and Patterns.

By Dejan Bosanac

Published by Addison Wesley Professional

ISBN-10: 0-321-32193-6

ISBN-13: 978-0-321-32193-0

You’ve nailed down the functional characteristics of scripting languages and seen how Python compares to Java for banging out fast code. Now look at the big picture: how does a non-native scripting language interact with the JVM and how will the strengths and weaknesses of scripting affect the runtime performance, robustness, and maintainability of your Java applications.

Scripting Languages and Virtual Machines

A recent trend in programming language design is the presence of a virtual machine as one of the vital elements of programming platforms. One of the main elements of the Java Runtime Environment (JRE) is the virtual machine that interprets bytecode and serves as a layer between the application and operating systems. A virtual machine serves as a layer between the application and operating systems in Microsoft’s .NET platform as well.

Let’s now summarize briefly how the JRE works. Java programs contained in java extension source files are compiled to bytecode (files with a class extension). As I said earlier, the purpose of bytecode is to provide a compact format for intermediate code and support for platform independence. The JVM is a virtual processor, and like all other processors, it interprets code—bytecode in this case. This is a short description of the JRE, but it is needed for our further discussion. You can find a more comprehensive description at the beginning of Chapter 3, “Scripting Languages Inside the JVM.”

Following this, we can say Java is a hybrid compiled-interpreted language. But even with this model, Java cannot be characterized as a scripting language because it lacks all the other features mentioned earlier.

At this point, you are probably asking what this discussion has to do with scripting languages. The point is many modern scripting languages follow the same hybrid concept. Although programs are distributed in script form and are interpreted at runtime, the things going on in the background are pretty much the same.

Let’s look at Python, for example. The Python interpreter consists of a compiler that compiles source code to the intermediate bytecode, and the Python Virtual Machine (PVM) that interprets this code. This process is being done in the background, leaving the impression that the pure Python source code has been interpreted. If the Python interpreter has write privileges on the host system, it caches the generated bytecode in files with a pyc extension (the py extension is used for the scripts or source code). If that script had not been modified since its previous execution, the compilation process would be skipped and the virtual machine could start interpreting the bytecode at once. This could greatly improve the Python script’s startup speed. Even if the Python interpreter has no write privileges on the system and the bytecode was not written in files, this compilation process would still be performed. In this case, the bytecode would be kept in memory.

Note – Python programs can be distributed in bytecode format, keeping the source code out of the production environment.

From this discussion, we can conclude virtual machines are one of the standard parts of modern scripting languages. So our original dilemma remains. Should we use languages that enforce a certain programming paradigm, and if so, how do we use them? The dynamic and weak typing, closures, complex built-in data structures, and so on, could be implemented in a runtime environment with the virtual machine.

There is nothing to restrict the use of a dynamic (scripting) language on the virtual machines designed for languages such as Java and C#. As long as we implement the compiler appropriate for the target virtual machine’s intermediate bytecode, we will receive all the features of the scripting language in this environment. Doing this, we could benefit from the strengths of both the system-programming approach of Java, and the scripting programming model in our software development process.

We focus on projects that bring scripting languages closer to the Java platform later in this book. Also, we discuss where it’s appropriate to apply the scripting style of development with traditional Java programming. Before we cover these topics, though, let’s take a look at how scripting and system programming compare.

A Comparison of Scripting and System Programming

Every decision made during the language design process is directly related to the programming style used in that language and its usability in the development process.

In this section, I do not intend to imply one style is better than the other. Instead, my objective is to summarize the strengths and weaknesses of both approaches so that we can proceed to Chapter 2, where I discuss how best to incorporate them into the development process.

Runtime Performance

It is clear programs written in system-programming languages have better runtime performance than equivalent scripts in most cases, for a few reasons:

The most obvious reason is the runtime presence of the interpreter in scripting languages. Source code analysis and transformation during runtime introduces additional overhead in terms of program execution.
Another factor influencing runtime performance is typing. Because system-programming languages force strong static typing, machine code created by the compiler is more compact and optimized for the target machine.

The fact that the script could be compiled to intermediate bytecode makes these interpreter performance penalties more acceptable. But the machine code is definitely more optimized than the intermediate code.

We have to take another point of view when talking about runtime performance, however. Many people approach runtime performance by asking which solution is faster. The more important question, which is often neglected, is whether a particular solution is fast enough.

You must take into consideration the trade-offs between the benefits and the runtime performance that each approach provides when you are thinking about applying a certain technology in your project. If the solution brings quality to your development process and still is fast enough, you should consider using it.

A recent development trend supports this point of view. Many experts state you should not analyze performance without comparing it to measurements and goals. This leads to debate concerning whether to perform premature or prudent optimization. The latter approach assumes you have a flexible system, and only after you’ve conducted the performance tests and found the system bottlenecks should you optimize those parts of your code.

Deciding whether scripting is suitable for some tasks in your development process must be driven by the same question. For instance, say you need to load a large amount of data from a file, and developing a system-programming solution to accomplish the task would take twice as long as developing a scripting approach. If both the system-programming and scripting solutions needed 1 second to load the data and the interpreter required an additional 0.1 second to compile the script to the bytecode, you should consider scripting to be a fast enough solution for this task. As we will see in a moment, scripts are much faster to write (because of the higher level of abstraction they introduce), and the end users of your project probably wouldn’t even notice the performance advantage of the system-programming solution that took twice as much time to develop.

If we take another point of view, we can conclude the startup cost of executing programs written in dynamic languages could be close to their compiled alternatives. The first important thing to note is the fact that bytecode is usually smaller than its equivalent machine code. Experts who support this point of view stress that processors have increased in speed much faster than disks have. This leads to the thinking that the in-memory operations of the just-in-time compilers (compiling the bytecode to the machine code) are not much more expensive than the operation of loading the large sequence of machine code from the disk into memory.

To summarize, it is clear system-programming languages are faster than scripting languages. But if you don’t need to be restricted by only one programming language, you should ask yourself another question: What is the best tool for this task? If the development speed is more important and the runtime performance of the scripting solution is acceptable, there is your answer.

Development Speed

I already mentioned dynamic languages lead to faster development processes. A few facts support this assertion.

For one, a statement in a system-programming language executes about five machine instructions. However, a statement in a scripting language executes hundreds or even thousands of instructions. Certainly, this increase is partially due to the presence of the interpreter, but more important is the fact that primitive operations in scripting languages have greater functionality. For example, operations for matching certain patterns in text with regular expressions are as easy to perform as multiplying two integers.

These more powerful statements and built-in data structures lead to a higher level of abstraction that language can provide, as well as much shorter code.

Of course, dynamic typing plays an important role here too. The need to define each variable explicitly with its type requires a lot of typing, and this is time consuming from a developer’s perspective. This higher level of abstraction and dynamic typing allows developers to spend more time writing the actual business logic of the application than dealing with the language issues.

Another thing speeding up the scripting development process is the lack of a compile (and linking) phase. Compilation of large programs could be time consuming. Every change in a program written in a system-programming language requires a new compile/link process, which could slow down development a great deal. In scripting, on the other hand, immediately after the code is written or changed, it can be executed (interpreted), leaving more time for the developer to actually write the code.

As you can see, all the things that increase runtime performance, such as compilation and static typing, tend to slow down development and increase the amount of time needed to build the solution. That is why you hear scripting languages are more human oriented than machine oriented (which isn’t the case with system-programming languages).

To emphasize this point further, here is a snippet from David Ascher’s article titled “Dynamic Languages—ready for the next challenges, by design,” which reflects the paradigm of scripting language design:

The driving forces for the creation of each major dynamic language centered on making tasks easier for people, with raw computer performance a secondary concern. As the language implementations have matured, they have enabled programmers to build very efficient software, but that was never their primary focus. Getting the job done fast is typically prioritized above getting the job done so that it runs faster. This approach makes sense when one considers that many programs are run only periodically, and take effectively no time to execute, but can take days, weeks, or months to write. When considering networked applications, where network latency or database accesses tend to be the bottlenecks, the folly of hyper-optimizing the execution time of the wrong parts of the program is even clearer. A notable consequence of this difference in priority is seen in the different types of competition among languages. While system languages compete like CPU manufacturers on performance measured by numeric benchmarks such as LINPACK, dynamic languages compete, less formally, on productivity arguments and, through an indirect measure of productivity, on how “fun” a language is. It is apparently widely believed that fun languages correspond to more productive programmers—a hypothesis that would be interesting to test.

Robustness

Many proponents of the system-programming approach say dynamic typing introduces more bugs in programs because there is no type checking at compile time. From this point of view, it is always good to detect programming errors as soon as possible. This is certainly true, but as we discuss in a moment, static typing introduces some drawbacks, and programs written in dynamically typed languages could be as solid as programs written in purely statically typed environments. This way of thinking leads to the theory that dynamically typed languages are good for building prototypes quickly, but they are not robust enough for industrial-strength systems.

On the other side stand proponents of dynamic typing. From that point of view, type errors are just one source of bugs in an application, and programs free of type-error problems are not guaranteed to be free of bugs. Their attitude is static typing leads to code much longer and much harder to maintain. Also, static typing requires the developer to spend more of his time and energy working around the limitations of that kind of typing.

Another implication we can glean from this is the importance of testing. Because a successful compilation does not guarantee your program will behave correctly, appropriate testing must be done in both environments. Or as best-selling Java author Bruce Eckel wrote in his book Thinking in Java (Prentice Hall):

If it’s not tested, it’s broken.

Because dynamic typing allows you to implement functionality faster, more time remains for testing. Those fine-grained tests could include testing program behavior for type misuse.

Despite all the hype about type checking, type errors are not common in practice, and they are discovered quickly in the development process. Look at the most obvious example. With no types declared for method parameters, you could easily find yourself calling a method with the wrong order of parameters. But these kinds of errors are obvious and are detected immediately the next time the script is executed. It is highly unlikely this kind of error would make it to distribution if it was tested appropriately.

Another extreme point of view says even statically typed languages are not typed. To clarify this statement, look at the following Java code:

List list = new ArrayList();
list.add(new String("Hello"));
list.add(new Integer(77));

Iterator it = list.iterator();
while (it.hasNext()) {
    String item = (String)it.next();
}

This code snippet would be compiled with no errors, but at execution time, it would throw a java.lang.ClassCastException. This is a classic example of a runtime type error. So what is the problem?

The problem is objects lose their type information when they are going through more-generic structures. In Java, all objects in the container are of type java.lang.Object, and they must be converted to the appropriate type (class) as soon as they are released from the container. This is when inappropriate object casting could result in runtime type errors. Because many objects in the application are actually contained in a more-generic structure, this is not an irrelevant issue.

Of course, there is a workaround for this problem in statically typed languages. One solution recently introduced in Java is called generics. With generics, you would write the preceding example as follows:

List list<String> = new ArrayList<String>();
list.add(new String("Hello"));
list.add(new Integer(77));

Iterator<String> it = list.iterator();
while (it.hasNext()) {
    String item = it.next();
}

This way, you are telling the compiler only String objects can be placed in this container. An attempt to add an Integer object would result in a compilation error. This is a solution to this problem, but like all workarounds, it is not a natural approach.

The fact that scripting programs are smaller and more readable by humans makes them more suitable for code review by a development team, which is one more way to ensure your application is correct. Guido van Rossum, the creator of the Python language, supported this view when he was asked in an interview whether he would fly an airplane controlled by software written in Python:

You’ll never get all the bugs out. Making the code easier to read and write, and more transparent to the team of human readers who will review the source code, may be much more valuable than the narrow-focused type checking that some other compiler offers. There have been reported anecdotes about spacecraft or aircraft crashing because of type-related software bugs, where the compilers weren’t enough to save you from the problems.

This discussion is intended just to emphasize one thing: Type errors are just one kind of bug in a program. Early type checking is a good thing but it is certainly not enough, so conducting appropriate quality assurance procedures (including unit testing) is the only way to build stable and robust systems.

Many huge projects written purely in Python prove the fact that modern scripting languages are ready for building large and stable applications.

Maintenance

A few aspects of scripting make programs written in scripting languages easier to maintain.

The first important aspect is the fact that programs written in scripting languages are shorter than their system-programming equivalents, due to the natural integration of complex data types, more powerful statements, and dynamic typing. Simple logic dictates it is easier to debug and add additional features to a shorter program than to a longer one, regardless of what programming language it was written in. Here’s a more descriptive discussion on this topic, taken from the aforementioned Guido van Rossum interview:

This is all very informal, but I heard someone say a good programmer can reasonably maintain about 20,000 lines of code. Whether that is 20,000 lines of assembler, C, or some high-level language doesn’t matter. It’s still 20,000 lines. If your language requires fewer lines to express the same ideas, you can spend more time on stuff that otherwise would go beyond those 20,000 lines.
A 20,000-line Python program would probably be a 100,000-line Java or C++ program. It might be a 200,000-line C program, because C offers you even less structure. Looking for a bug or making a systematic change is much more work in a 100,000-line program than in a 20,000-line program. For smaller scales, it works in the same way. A 500-line program feels much different than a 10,000-line program.

The counterargument to this is the claim that static typing also represents a kind of code documentation. Having every variable, method argument, and return result in a defined type makes code more readable. Although this is a valid claim when it comes to method and property declarations, it certainly is not important to document every temporary variable. Also, in almost every programming language you can find a mechanism and tools used to document your code. For example, Java developers usually use the Javadoc tool to generate HTML documentation from specially formatted comments in source code. This kind of documentation is more comprehensive and could be used both in scripting and in system-programming languages.

Also, almost every dynamically typed language permits explicit type declaration but does not force it. Every scripting developer is free to choose where explicit type declarations should be used and where they are sufficient. This could result in both a rapid development environment and readable, documented code.

Extreme Programming

In the past few years, many organizations adopted extreme programming as their software development methodology. The two basic principles of extreme programming are test-driven development (TDD) and refactoring.

You can view the TDD technique as a kind of revolution in the way people create programs. Instead of performing the following:

Write the code.
Test it if appropriate.

The TDD cycle incorporates these steps:

Write the test for certain program functionality.
Write enough code to get it to fail (API).
Run the test and watch it fail.
Write the whole functionality.
Run the code and watch all tests pass.

On top of this development cycle, the extreme programming methodology introduces refactoring as a technique for code improvement and maintenance. Refactoring is the technique of restructuring the existing code body without changing its external behavior. The idea of refactoring is to keep the code design clean, avoid code duplication, and improve bad design. These changes should be small because that way, it is likely we will not break the existing functionality.

After code refactoring, we have to run all the tests again to make sure the program is still behaving according to its design.

I already stated tests are one way to improve our programs’ robustness and to prevent type errors in dynamically typed languages. From the refactoring point of view, interpreted languages offer benefits because they skip the compilation process during development. For applications developed using the system-programming language, after every small change (refactoring), you have to do compilation and run tests. Both of these operations could be time consuming on a large code base, so the fact that compilation could be omitted means we can save some time.

Dynamic typing is a real advance in terms of refactoring. Usually, because of laziness or a lack of the big picture, a developer defines a method with as narrow an argument type as he needs at that moment. To reuse that method later, we have to change the argument type to some more general or complex structure. If this type is a concrete type or does not share the same interface as the one we used previously, we are in trouble. Not only do we have to change that method definition, but also the types of all variables passed to that method as the particular argument. In dynamically typed languages, this problem does not exist. All you need to do is change the method to handle this more general type.

We could amortize these problems in system programming environments with good refactoring tools, which exist for most IDEs today. Again, the real benefit is speed of development. Because scripting languages enable developers to write code faster, they have more time to do appropriate unit testing and to write stub classes. A higher level of abstraction and a dynamic nature make scripted programs more convenient to change, so we can say they naturally fit the extreme programming methodology.

The Hybrid Approach

As we learned earlier in this chapter, neither system-programming nor scripting languages are ideal tools for all development tasks. System-programming languages have good runtime performance, but developing certain functionality and being able to modify that functionality later takes time. Scripting languages, on the other hand, are the opposite. Their flexible and dynamic nature makes them an excellent development environment, but at the cost of runtime performance.

So the real question is not whether you should use a certain system-programming or scripting language for all your development tasks, but where and how each approach fits into your project. Considering today’s diverse array of programming platforms and the many ways in which you can integrate them, there is no excuse for a programmer to be stuck with only one programming language. Knowing at least two languages could help you have a better perspective of the task at hand and the appropriate tool for that task.

You can find a more illustrative description of this principle in Bill Venners’s article, “The Best Tool for the Job“:

To me, attempting to use one language for every programming task is like attempting to use one tool for every carpentry task. You may really like screwdrivers, and your screwdriver may work great for a job like inserting screws into wood. But what if you’re handed a nail? You could conceivably use the butt of the screwdriver’s handle and pound that nail into the wood. The trouble is, a) you are likely to put an eye out, and b) you won’t be as productive pounding in that nail with a screwdriver as you would with a hammer.
Because learning a new programming language requires so much time and effort, most programmers find it impractical to learn many languages well. But I think most programmers could learn two languages well. If you program primarily in a systems language, find a scripting language that suits you and learn it well enough to use it regularly. I have found that having both a systems and a scripting language in the toolbox is a powerful combination. You can apply the most appropriate tool to the programming job at hand.

So if we agree system-programming and scripting languages should be used together for different tasks in project development, two more questions arise. The first, and the most important one, is what tasks are suitable for a certain tool.

The second question concerns what additional characteristics scripting languages should have to fit these development roles.

Let’s try to answer these two questions by elaborating on the most common roles (and characteristics) scripting languages had in the past. This will give us a clear vision of how we can apply them to the development challenges in Java projects today, which is the topic of later chapters.

A Case for Scripting

To end our discussion of this topic, I quote John K. Ousterhout, the creator of the Tcl scripting language. In one of his articles, he wrote the following words:

In deciding whether to use a scripting language or a system programming language for a particular task, consider the following questions:

Is the application’s main task to connect together pre-existing components?
Will the application manipulate a variety of different kinds of things?
Does the application include a graphical user interface?
Does the application do a lot of string manipulation?
Will the application’s functions evolve rapidly over time?
Does the application need to be extensible?
“Yes” answers to these questions suggest that a scripting language will work well for the application. On the other hand, “yes” answers to the following questions suggest that an application is better suited to a system programming language:
Does the application implement complex algorithms or data structures?
Does the application manipulate large datasets (e.g., all the pixels in an image) so that execution speed is critical?
Are the application’s functions well-defined and changing slowly?

You could translate Ousterhout’s comments as follows: Dynamic languages are well suited for implementing application parts not defined clearly at the time of development, for wiring (gluing) existing components in a loosely coupled manner, and for implementing all those parts that have to be flexible and changeable over time. System languages, on the other hand, are a good fit for implementing complex algorithms and data structures, and for all those components that are well defined and probably won’t be modified extensively in the future.

Conclusion

In this chapter, I explained what scripting languages are and discussed some basic features found in such environments. After that, I compared those features to system-programming languages in some key development areas. Next, I expressed the need for software developers to master at least one representative of both system-programming and scripting languages. And finally, I briefly described suitable tasks for both of these approaches.

Before we proceed to particular technologies that enable usage of scripting languages in Java applications, we focus in more detail on the traditional roles of scripting languages. This is the topic of Chapter 2, and it helps us to better understand scripting and how it can be useful in the overall system infrastructure.

Dejan Bosanac is a professional software developer and technology consultant. He specializes in the integration and interoperability of diverse technologies, especially those related to Java and the Web. He has spent several years developing complex software projects, ranging from highly trafficked Web sites to enterprise applications, and was a member of the JSR 223 Expert Group.

JavaSoftware Development

Topics

About

Policies

Our Network

More

Introduction to scripting in Java, Part 2

Find out what else you need to know about scripting in this second half of the JavaWorld excerpt from Dejan Bosanac's Scripting in Java: Languages, Frameworks, and Patterns (Addison Wesley Professional, August 2007)

Scripting Languages and Virtual Machines

A Comparison of Scripting and System Programming

Runtime Performance

Development Speed

Robustness

Maintenance

Extreme Programming

The Hybrid Approach

A Case for Scripting

Conclusion

Show me more

Databricks pitches Lakewatch as a cheaper SIEM — but is it really?

A data trust scoring framework for reliable and responsible AI systems

Rethinking VM data protection in cloud-native environments

How to run your own little local Claude Code (sort of!)

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy