“Runnability” testing of Java programs

A unique aspect of Java technology is that programs can run anywhere without recompilation. Here's how to test your programs so that you can find any runnability problems before your customers do

The word portability has, as its root, the verb “to port,” which in software terms means to adapt and recompile software to run on a different computing platform. Portability is the ability of a program to be recompiled for another platform. Because creating and maintaining several independent versions of a program is very expensive, portability has long been one of the holy grails of the software industry. Software vendors would like to be able to develop and maintain a single version of their product that can run on many different systems and serve many different market segments.

Previous efforts to enable portability have included a multitude of standards efforts, some interpreters, and a few portable runtime systems. Most have required recompilation; none have achieved mainstream success. In the meantime, economic pressures on software developers and the urgency of delivering a multiple-platform solution in a timely fashion have made the goal more desirable than ever.

Portability and Unix

The Unix programming environment, as codified in the POSIX standards, took a major step toward portability by delivering “source code” portability. Basically, a software product that runs on one POSIX platform can be ported to a new POSIX system by recompiling the source programs — ideally, with no required source code changes. In reality, source code changes and the consequent burden of multiple versions are often required and unavoidable.

Portability and X Window

Other systems — such as the X Window System — address portability issues by providing a layered architecture that isolates system-dependent functionalities. The X architecture successfully abstracts the details of the window and display system. A program using the X Window System can use any X Window display hardware, although that program is still tied to a specific instruction set and operating system architecture.

Portability and Java

Neither the Unix or X Window efforts went far enough to provide universal portability. The Java programming environment, however, makes a significant step toward achieving this elusive goal. From the beginning, Java technology was intended to provide a programming environment that supports the write-once, run-anywhere (WORA) concept. Java technology has largely delivered on this promise by ensuring that Java programs will run across all Java-enabled platforms. Achieving this goal has reduced system dependencies to a very large degree.

When faced with a new computer architecture, a Java program does not need to be ported to it, or recompiled — one delivery format serves as a universal program representation, usable on all Java-enabled computers. This capability has been achieved by a combination of a well-documented binary representation, the class file format, and a class library that makes the underlying platform functionality available through a universal abstraction layer, the Java Core APIs. We, therefore, prefer to speak of a Java program’s runnability rather than its portability. A Java program does not need to be ported to a new computer — it can simply be run. A Java program can contain constructs, however, that will deprive it of this property of runnability across all platforms. It is possible to unintentionally write a platform-specific Java program, since not all digital systems can be perfectly abstracted. For example, the file-naming convention in operating systems and display resolutions in hardware devices present different behaviors that may impede runnability.

It is also possible to use the Java programming language to write a program that is intentionally tied to a specific platform. If the nonportability of a program is intentional, there is little need to discover it by testing. Such programs fall outside the scope of this article.

In this article we discuss aspects of Java programs that need to be carefully examined and tested to identify where runnability problems may occur, and how to test for runnability. We’ll also discuss how to measure the runnability of a Java program, keeping in mind the question: How well does the program deliver on the WORA promise? First, we’ll examine some of the details of runnability, then we’ll discuss how runnability testing can be incorporated into the software quality assurance process.

Differences in Java environments

In one vision of a perfect world, no testing would be necessary, and there would be no decision to make about the target environments for a program. All computer platforms’ Java Runtime Environments (JREs) would be exactly the same, and a program that worked on one computer would work identically on all other computers. But that vision isn’t the one promised by Java technology; it is, in fact, counter to the premise on which Java technology is based.

That premise is that the Java Runtime Environment represents the behavior of all computers with a common abstracted interface. However, the details of the actual computer must still show through the interface in some places — so Java technology doesn’t pretend to offer exactly the same behavior on all computers. Different computing platforms have different behavior, and Java technology makes it possible for Java programs to adjust to these differences.

We’ll discuss how these differences may occur, what effect they may have on the behavior of a Java program, and how to ensure that the program fulfills its intended functionality on various platforms. The measurements we suggest will also help to delimit and control any expected differences in behavior.

Platform differences

The specification of the Java Runtime Environment (JRE) — that which a program may depend on — is quite complete, but it is not hermetically sealed. Platform differences visible from within Java programs include the size and appearance of GUI elements (as abstracted by the Abstract Windowing Toolkit (AWT)), the syntax of file names, and the details of thread-scheduling behavior. For example, a program that looks great on a 1024×800 24-bit display may not be usable on a 640×480 8-bit display. A program that looks great at one window size may, if certain crucial buttons aren’t displayed, be unusable at another. A program that depends on a specific font may not work at all on a machine that doesn’t have that font installed. Similarly, the syntax of a file name varies widely among the various Java platforms. The java.io.File class provides a useful level of abstraction that, if used correctly, can insulate your program from these platform specifics. However, if you write

    File t = new File("/dev/tty");

you can’t expect your program to work on anything but a Unix machine (or, at least, a POSIX machine). (See Sidebar 1 for the source code for a helper class that makes it easy to use java.io.File in a portable way.) A more subtle kind of platform variation occurs in thread scheduling. The Java Language Specification1 provides quite a lot of detail about what may be expected from the thread scheduler of all JRE implementations. Due to the nature of multithreaded programming, however, it’s entirely possible to write a program that will run under one scheduling policy but will either hang or produce invalid results under another policy. (See Sidebar 2 for examples of code that will work under some, but not all, scheduling policies.)

Despite these differences, it’s entirely possible to write Java programs that operate correctly on all Java platforms. The purpose of runnability testing is to raise a programmer’s confidence that deploying the tested programs to a wide spectrum of computers won’t result in disaster.

Range of platform environments

In order to understand runnability testing, we have to understand the scope of our ambition. Runnability testing asks whether a program can run, without alteration, on a variety of Java platforms. First, we’ll discuss what it means for a program to run, then we’ll discuss what could affect the runnability of Java programs on various platforms.

Delivery mechanism

The delivery mechanism for a Java program can have several different forms, depending on how the program is invoked and used. It may be in the form of an applet that works within an HTML browser; it may be an application that is invoked from the command line; or it may be a servlet that operates within a Web server. It may use the AWT to communicate with the user, or it may be faceless and do all its work by reading and writing files or network streams. These differences are fundamental to the character of the program. A Java application program will never accidentally become a servlet — the program is written for a specific set of delivery mechanisms. Therefore, the delivery mechanism is not a portability consideration.

Platform variations

Independent of the delivery mechanism, a Java program may encounter platform variations. These variations may include implementation details of the JRE, security policies set by the user, and optional packages that may or may not be installed.

Platform bugs

Unfortunately, some JRE implementations are imperfect — in fact, there are bugs in all JRE implementations. We expect that as Java technology matures, these imperfections will be corrected and disappear. But in the meantime, and in order to deliver value to users, programmers must cope with these bugs. This unpleasant fact is a major motivation for runnability testing. If all JRE implementations followed the specification, and if there were no ambiguities or errors in the specification, a minimum amount of runnability testing would be necessary (only enough to ensure that the rules had been followed). But since our users don’t have access to that perfect implementation, we have to take the extra step to assure that our software is useful on the JREs they do have. This fact also poses a challenge in runnability testing: platform bugs aren’t well designed, and can’t be predicted from design principles or (generally) researched in documentation. In the face of platform bugs, programmers can’t expect testing to provide absolute assurance that they’ve achieved runnability. At best, testing can only increase their confidence level.

This is, in fact, the situation for all testing. Except in the most trivial cases, we can never prove the absence of bugs by software testing. The point of testing is to increase our confidence that most of the bugs in a program have been caught and that the intended functionality has been demonstrated under a controlled set of states. In this light, runnability testing is like any other testing — it employs standard testing techniques, only with a different emphasis.

Security policy issues

When running Java programs, an important consideration is the Java security model. One of the advantages of the Java platform is that it gives the user very good safety; a security-restricted Java program runs inside a sandbox and therefore cannot damage the user’s computer or violate the user’s privacy. Any Java program, especially downloadable code like an applet, must expect to run inside a sandbox and must be prepared to be denied access to various system resources, as controlled by the SecurityManager interface.

This security model is another dimension of runnability for Java programs. Developers must understand the assumptions their program makes about the security policy, and how it will cope if the user has chosen to set a different security policy. This, again, is not a consideration exclusive to Java programs; most operating systems have some access control mechanisms, and so any program may be denied access to protected resources. Java technology differs from operating systems only in that it has a stronger and more detailed security model, so that a program is more often exposed to the denial of access.

Extensions and libraries

Another issue for runnable Java programs is the use of various extensions, or class libraries. There is not a clear distinction in form between an extension (intended as an expansion of the capabilities of the Java platform) and a class library (intended as code to be incorporated into the user’s program). Indeed, every class library can be regarded as an extension of the platform capabilities. Extensions may be categorized as standard — as specified by the Java Software Division of Sun Microsystems through an industry consensus process — or as proprietary. The standard and proprietary extensions have different runnability implications. In addition, there is an important class of external software that a program may depend on that isn’t used directly by the program. We’ll describe such software as driver software, because it’s most often used indirectly, via a service request of some kind.

Standard extensions

A standard extension is an interface published by Sun that conforms to certain criteria for industry acceptance and platform independence. Standard extensions are published as public documentation, and typically are provided with a reference implementation and a compatibility test. These publication standards give the developer a certain level of confidence that a standard extension, if available, will be consistent from platform to platform. This reduces the load on the runnability test — a standard extension can be treated as an optional part of the Core APIs, and can (if present) be trusted to the same degree. Of course, it’s wise to specify what a program should do if a standard extension it uses isn’t available, and to test for that specified behavior. Once it has been determined that the extension is available, however, variations in behavior shouldn’t be a major consideration.

Proprietary extensions

Propriety extensions are, in fact, library code that can be called from a program. For purposes of runnability testing, proprietary extensions aren’t different from code written specifically for the program — except that the developer may have less knowledge of and less control over the extension code. Hence, the extensions may require additional testing efforts. On the other hand, if the extensions have been widely used in a variety of different programs and in different Java environments, the programmer may be able to rely on their runnability. Then, he or she can concentrate the testing effort on the program-specific code.

Drivers

Drivers, service providers, and handlers are various kinds of software that aren’t directly invoked by your program but that are invoked by the Java Core API code in response to a request from your program. This class of software includes JDBC drivers, cryptography providers, and protocol handlers. Drivers are an interesting source of runnability bugs. Because they aren’t directly referenced by your program, it’s easy to miss them when packaging a program. There is no uniform mechanism for locating and installing them, so packaging becomes an issue. Also, different drivers may intentionally have different behavior, or may have platform dependencies that cause different behavior unintentionally.

Expected variations

The specification of the Java Computing Environment (JCE) leaves certain aspects open, giving the implementor some freedom of choice. A runnable program must expect to encounter these allowed variations and deal gracefully with them. Following is a discussion of some of the variations you can expect.

Speed variations

Java computing environments obviously differ in speed. An important point is that they differ not only in absolute speed but in the relative speeds of different operations. On one system, socket creation may take 400 times as long as floating-point multiply; on another system, it will take only 100 times as long. This is just one reason to avoid timing-loops in Java programs. Other reasons are the fact that a timing-loop will obstruct the other threads in your program, including the system threads, and the presence of a very convenient sleep primitive, java.lang.Thread.sleep. These speed variations, along with the variance in network delays that can be encountered, make it problematic to set a default value for constructs like network timeouts. The best solution for a Java program that needs to be runnable across a wide variety of networks is to take a soft timeout: When a timeout expires, pop up a dialog box informing the user of the situation and asking them if the operations should be retried. An alternative is to get timeout values from a property file, so the user can change them if need be.

Display variations

Some Java machines have large screens; some have small. Displays may come at various resolutions. AWT layout managers and resources can be used to make a program adaptable to various display hardware. There is usually a minimum configuration that can be supported, and this minimum configuration should probably be documented.

The testing process

Testing, which requires supplying input to the program being tested and then examining the results, may be performed at all levels of program development — from unit tests for individual classes or methods (structural testing) to system or usability tests at the outermost layer of the program (functional testing). Runnability testing is likely to occur at the outer levels of the program, because it is by nature a test of the integrated product — and because runnability problems often don’t show up in an isolated piece of the program.

Although, fundamentally, runnability testing is functionality testing, and can be done entirely from outside the program, it’s also useful to delve into the interior of a program during the test. For example, it may be efficient to take control of an interior component module of a program in order to exercise it more directly than can be done by exercising the integrated product. It may also be useful to insert a probe into the interior of a program, to help diagnose problems that may be masked in the integrated product.

When it comes to testing, a big advantage of Java technology is that it is quite easy to write and maintain this kind of into-the-box code. Because Java programs are linked on demand, at runtime, the test probe can be inserted without recompiling or relinking the rest of the program.

Earlier, we discussed some of the reasons a Java program may fall short of the write-once, run-anywhere goal. Below, we explore some ideas about how to efficiently screen for these shortcomings. Your tests for Java program runnability should have the following properties.

Robustness testing

A program is said to be robust if it successfully completes its job even when it encounters unanticipated circumstances, as a result of some input data or execution environment anomalies. Note that runnability testing is essentially robustness testing. We make the fundamental assumption that functionality testing has been successfully completed and we know what the program is supposed to do. Further, we know how to measure what it has done. The challenge of runnability testing is to ascertain that the tested program fulfills its function on other JRE implementations — in other words, that the program is robust against platform variations.

Note that this is slightly different from testing that your program does the same thing on every platform. It is entirely possible that the program specification implies that your program, in order to be correct, should behave differently on a different platform. A simple example is:

    class WhatOS {
        public static void main(String[] args) {
            String osName =System.getProperty("os.name");
            System.out.println(osName);
        }
    }

The above program is runnable in any Java environment, but clearly it produces different results on different platforms. A more subtle example is window displays and layouts, the details of which are likely to be platform dependent even though the intention and functionality is the same.

One approach to testing for platform-dependent features is to use a golden file test, judging a result to be good if it is identical to a sample result that has been identified as acceptable (“golden”). The utility and portability of such an identity test can be increased by writing a more sophisticated comparison function to compare the actual program output with the reference output. However, before going too far down this path, please do think about how the comparison relates to the program specification. Is the comparison an accurate and efficient reflection of what the program should do for the user? The purpose of testing, after all, is to serve as an advocate and representative for the users of the program. Measurements made in testing should reflect users’ needs.

Regression testing

In regression testing we test for degradation of the program’s correctness or other quality attributes due to some modification of the software product. In general, during the regression testing, the platform is held constant while the program undergoes slight variations. This is the opposite of runnability testing, where the platform is subject to slight variation while the program is held constant. Otherwise, runnability testing and regression testing are similar. Both kinds of testing present the problems of repeating and reproducing tests accurately, and of dealing with slight variations in the test results. In both kinds of testing, a judgement must be made for each variation: Is this an allowed difference, or is it a bug?

Unlike runnability testing, regression testing doesn’t necessarily focus on one special area. Regression testing is carried out because programmers have modified the code and aren’t certain of the impact of their modifications. Additionally, during regression testing, special attention is paid to the areas of the code that can be predicted to be buggy — either because they’ve been buggy in the past, or because they’re new.

In contrast, runnability testing concentrates on the areas of the program that may reflect the runnability bugs described in the section “Differences in Java Environments,” above. Our prediction of bugs is based on general knowledge, rather than on specific knowledge of the program under test.

Repeatable tests

A fundamental characteristic of a runnability test is that it will be repeated. This is true, of course, for any test — a test case that can’t be repeated is a test case that can’t help you find or fix a bug. A runnability test is intended to be repeatable on each system to which you are porting, or (for run-anywhere Java programs) on each system you’re running. Random test data lacks repeatability. It may seem tempting to supply a sequence of random inputs to a function, or to click on random buttons in a user interface, but in fact your time is better spent designing a list of inputs that reflect the users’ requirements. This list provides a measure of confidence in the features that have been tested for runnability, and a measure of confidence that you have tested the same features on all platforms.

Runnable tests

In order for a test case to be useful on a variety of platforms, runnability tests must themselves be portable. However, because they can be adapted to the specific set of platforms you’re using, they don’t have to achieve the level of runnability you’re testing for. It isn’t unreasonable, in order to spare users the pain, to subject test engineers to porting efforts. There are two requirements for a runnable test: it must be possible to run it on a variety of platforms, and its results must be valid.

Running tests often involves some kind of test harness or administrative framework, which imposes requirements of its own. Specifically, the connection between the test and the test framework may be difficult to set up when testing applets. With an applet test framework, you can’t count on being able to save any test results to a file. Even when doing manual testing, there may be installation difficulties in getting the tested code installed and running on the test platforms. Packaging the code, setting the classpath, and invoking the Java virtual machine (JVM) are platform-dependent activities and may need to be done in a different way for each test platform. Testing an applet may also depend on the HTML tag used to embed the applet, which differs somewhat across different versions of HTML. Do you use:

    <applet class="Foo.class" codebase="../.."></applet>

    <applet class="Foo.class" archive="foo.jar"></applet>

        <object></object>

or the Java Plug-In (Activator) syntax?

The other side of test runnability is the correctness criteria, which we mentioned in the first section. In order to write a runnable test, your evaluation of the result of the test must be sufficiently high-level to apply to different platforms. For example, a bitmap dump of the window painted by a Java program isn’t likely to be of much use as a golden file in runnability testing, unless you have a human do the comparison. Any comparison or evaluation function you use in runnability testing must examine not the implementation details of the output, but its adequacy according to the program specifications.

Robust tests

Recall that runnability testing has some similarities to regression testing. In fact, runnability testing will often be performed along with regression testing. Once you have a runnability test set up, it makes good sense to repeat at least part of the testing for new releases. And certainly, if you do find and correct bugs as a consequence of runnability testing, you’ll want to repeat the tests to verify that the bugs really are fixed. This means that runnability tests must also be robust against program changes. Some amount of this robustness is gained by the same abstraction that is necessary for test runnability. A test case that measures program compliance with the specification will be valid as long as the specification doesn’t change.

The other requirement for robustness is in the test input data. Programmers should try to organize and structure their tests so that the tests will be easy to adapt to changes in the program. Good test documentation is needed that will specify tests’ dependencies on other test cases, and on program features. The robustness of a test case improves when a test case is cohesive and focused on a single feature of the program. A test written using a large “smorgasbord” test case is likely to break with every detailed program change.

Test automation

Automation is, essentially, investing in machinery rather than labor; or, investing in machinery to get more value from your labor. Automation can be applied during all phases of the testing process — from test generation, to test execution and test analysis. It can provide real advantages, especially in highly-repetitive situations. Test automation can save time and money, and is especially applicable to runnability testing — as long as your automation tools provide the necessary runnability and robustness.

The kinds of machinery available to help with runnability testing are:

Test generation — Test generation tools assist the test engineer in the creation of individual tests, or test suites. This is a broad category, ranging from requirements-analysis tools to test program generators, with various degrees of automation. If we regard testing as the combination of activation (providing input data and executing the code under test) and evaluation, these tools may give assistance in either or both of these tasks.
Test framework — A test framework provides a control environment for running a test, including any necessary set up and some kind of result accumulation. It is a tool for managing and using a test suite.
Capture/replay — A capture/replay tool is like a tape recorder for GUI actions (mouse clicks, typing, window events). It records a sequence of GUI interactions with a program, in the form of a test script that can be replayed to test the program with a simulated user. Sophisticated capture/ replay tools allow script editing and the insertion of correctness predicates into the script.
Coverage measurement — A coverage measure is an indication of how much testing has been accomplished. Often, this is a measure of how much of the program has been exercised in the test cycle. This may be as simple as a measure of how many of the methods in the program have been invoked, or as sophisticated as a measure of how many of the possible branch or execution paths in a program have been taken.

Beta testing

The real test of any testing strategy ultimately is measured by how well the product works in real-life applications. Runnability testing could also benefit from real-life testing. This is achieved during a phase of the testing process that is often referred to as beta testing. Beta testing is useful because it brings a program into contact with real customers in real circumstances. The most direct way to get information on the suitability of a program to its users’ requirements is to allow those users to use the program. No description, however accurate, can substitute for the experience of trying to use the program. No test laboratory, however extensive, can simulate all the variation of the real world.

Since beta testing requires the participation of multiple customers with varying needs and situations, it requires well-thought-out coordination. The beta sites should be selected to cover multiple scenarios that will give the best coverage of functionalities of the software under test. Specifically, for runnability testing it’s best to select a number of beta sites that cover a wide variety of platform implementations and execution environments. Other standard beta criteria also need to be applied. These criteria include the establishment of strategic partnerships with selected beta sites, agreement on active use of the product, and the establishment of a formal and systematic communications channel between the development and support team and the beta sites. Typically, this may include the assignment of a product team member to one or more beta sites and the scheduling of regular review meetings between this member and the sites in order to harvest the results and communicate them to the program team.

Beta testing doesn’t fit well with all marketing strategies. A company that relies on technical innovation and surprise for its competitive edge may have a problem with customer testing, as it gives the competition a longer lead-time with which to counter or imitate. This, as well as pride in craftsmanship, has at times given beta testing an unsavory reputation in engineering circles.

Runnability test planning

What’s the best strategy for runnability testing? How can the concepts outlined above be put into practice? The goal of runnability testing, as with most other testing approaches, is to protect your customers from bugs. With a finite test budget, and with a limited amount of time for testing, it is not possible to prevent all bugs. Given these constraints, the goal of runnability test planning is to enable the test engineer to do as much good as possible within the time allowed.

A runnability test plan must tell programmers where to focus their testing efforts, how much testing to do, and how many test platforms are required.

Where to focus?

The goal of a runnability test is to discover ways the program unintentionally depends on features of the JRE that may vary from one implementation to another. Therefore, runnability testing should focus on program features that make use of parts of the Java Core API that are known to expose platform differences, like file name syntax and screen layouts. Two kinds of platform variances must be kept in mind during test planning.

The first kind of platform variance is allowed variance. You can test for runnability to any platform by testing to make sure that your program works across a wide spectrum of allowed implementations. For example, you can test the file operations of your program on Unix and on MacOS. This combination covers a wide spectrum of file name syntaxes, so this testing gives you good confidence that your program will be runnable on other systems.

The other kind of platform variance is the result of a bug in the implementation of the JRE. Ideally, there would be no bugs in JRE implementations, so your program wouldn’t have to cope with any. This isn’t the case, however, and generalized testing for runnability in the presence of buggy platforms isn’t all that effective. Instead, programmers must test on the specific platform on which they plan to deploy. Of course, one can generalize, to some degree, that two implementations derived from a common code base have some commonality of bugs; so testing on one implementation offers some confidence in the runnability of its cousins. We expect that as Java technology matures, these sources of bugs will become less prevalent.

At the minimum, a test plan should include platform variance tests that focus on specific areas of the program that might depend on implementation specifics. Test cases should be developed and run on implementations that vary as much as possible. The test plan should also include broad functionality tests, run on your most important deployment platforms.

When to stop?

How much runnability testing is required? For anything but the most trivial program, no test program can prove the code correct. It’s always possible for a bug to slip through. However, there is a common relationship between the amount of testing performed and remaining bugs. In most cases, the curve looks something like the figure below, with a knee in the curve corresponding to a moderate testing effort. A test effort less than this knee won’t catch the obvious bugs; a test effort much greater than this knee will catch more bugs, but at a much higher cost per bug. The appropriate test effort depends largely on the cost of a bug. For safety-critical programs, the cost of a bug is very high, so a large test effort is justified. For a throwaway personal program, the cost of a bug may be very low, so minimum testing may be appropriate.

How do you know where you are on this curve? How do you measure the confidence gained by testing?

Coverage measurement

The most concrete measure of a testing effort is a coverage measurement. One example of a way to apply a coverage measurement is to use a path coverage measurement, looking only at the paths that use the java.io.File class, to check that your runnability testing has covered all uses of the File class. Note that if you have high confidence in the repeatability of your tests you can gather coverage measurements on your test development platform alone. If your tests are robust, the coverage is likely to be the same on all platforms. This is a judgement call, because until you have the coverage information from different platforms you don’t have a measure of the repeatability of your tests. You have to balance the cost of the coverage measurement against the benefit of the increased confidence in the test.

Feature analysis

Another measure of the adequacy of your tests is a feature analysis or a scenario test. This is more closely tied to the program specification and the users’ requirements than to the program code. The measure of test adequacy here is: How well does this test mimic the user’s experience? Are the program’s features exercised? Does the test include typical usage scenarios? This can be achieved by establishing a functionality traceability matrix that lists all the relevant user requirements, and records availability and successful runnability of those tests on various platforms.

These two criteria for test adequacy are not exclusive. They can, and should, be used to judge the same test suite. A test suite has probably not made it to the sweet spot in the curve shown in the figure above if it only achieves 10 percent code coverage; a test suite is probably not adequate for commercial release if it does not reflect the major features of a product. As products mature, the goal for the code coverage metric generally approaches 80 percent, and the traceability matrix goal approaches 100 percent.

It is possible to get much deeper into the study of test adequacy than can be covered in this article. For example, the number of bugs found per unit testing time can be plotted and fitted to a curve in an attempt to predict the benefit of future test efforts. Another technique from the software engineering literature is bug seeding. This approach introduces known bugs into a program, without informing the test team, then tracks the proportion of those bugs that have been found in testing. We doubt that this latter technique will be useful for runnability bugs, as runnability bugs are so highly correlated that the underlying statistical assumption of the independence of each bug doesn’t hold. Even though a full-scale defect analysis may not be feasible in your testing effort, it’s still quite useful to know how you’ll know you’re done before you start. At least set a simple coverage goal, so that you can track and manage the effectiveness of your testing effort.

How many platforms?

It would be nice if we could test on every platform that our customers are going to use. However, unless we’re operating in an extremely constrained environment, this isn’t possible. This means that our customers will use our Java program on platforms (or at least on platform configurations) that we haven’t been able to test.

How can programmers limit the consequent risk? By choosing a representative set of platforms for our runnability testing. This is similar to the problem of selecting test data. A good strategy is to use a mixture of the most common and the most disparate platforms, thus ensuring utility and measuring runnability. By ensuring that the testing platforms span the range of implementation choices in the parts of the JRE that your program uses, you gain confidence that your program will operate correctly on platforms between the extremes of implementation choices. For example, if your program uses file names, it’s worthwhile to test it on both Macintosh and Windows machines, which have very different file-name syntaxes, to ensure that the program has used the File class in a runnable way.

Limits of testing

Testing cannot eliminate all risk. Runnability testing cannot ensure that your Java program will run in all Java environments. On the one hand, testing cannot catch all bugs even on a single environment. On the other, it is not feasible to test on all Java environments. The benefit of testing is to reduce the risk. There are particular areas of runnability where testing is particularly difficult. In particular, testing a program for correct use of the multithreading capabilities of the Java programming language is impossible. This is because multithreading bugs are typically exposed only by a particular order of execution of the threads involved, and this depends on the thread scheduler, the load on the machine, the time taken by various I/O operations, and on other factors that are neither measurable nor controllable in a test. So it’s very difficult to trigger a multithreading bug — and even harder to reproduce it. Particularly in the area of multithread behavior, tools that can analyze and diagnose program behavior can significantly improve your testing productivity. Finally, late testing is the third most expensive way to catch a bug. (The most expensive is to lose a sale; the second most expensive is to get a support call). The earlier you can catch a bug, the cheaper it will be, and the less widespread the bug is likely to be.

Alternative quality assurance strategies

Testing is only part of a complete quality assurance strategy. The other, cheaper opportunities to catch bugs occur earlier in the software development process.

The best way to catch bugs is to prevent them. The Java programming language encourages good interface and class documentation, and good encapsulation. Why not use these facilities to aid in design and code reviews, to prevent bugs before they get coded? Especially on the topic of runnability, you may find the 100% Pure Java Cookbook (see Resources), which is a compendium of runnability hints and tips, useful background reading for a runnability review. After you have a program written, it can be statically checked. The Java compiler, with its strict type checking, does a great deal of this checking. The class verifier does more. In the area of runnability, the SunTest group at Sun builds and distributes a program that reads Java class files and informs the user of potential runnability problems — see the “Java Testing Tools from Sun” section in Resources.

Conclusion

Java technology brings the dream of ultimate portability, which we have dubbed runnability, within our grasp. In order to keep that goal from slipping through our fingers, we need to do some testing specifically for runnability. This article has provided some recommendations and some ideas for how to perform runnability testing, so that you may confidently deliver your Java programs with the claim that they will run anywhere.

Roger Hayes received the B.S. degree in computer science from Portland State University in 1982, the M.S. degree in computer science from the University of Arizona in 1986, and the Ph.D. degree in computer science from the University of Arizona in 1989. Since 1989, he has worked at Sun Microsystems, in platform software and in automated testing methods. He is currently a member of the SunTest group, working on software testing automation for Java programs. He is the author of the 100% Pure Java Cookbook, as well as several patents, scholarly papers on formal testing methods, and articles on Java testing. Manoochehr Ghiassi currently is an associate professor and chair of the Department of Operations and Management Information Systems at Santa Clara University. He received a B.S. from Tehran University, Iran, and an M.S. in economics from Southern Illinois University at Carbondale, Illinois. He also received an M.S. in computer science and a Ph.D. in industrial engineering, both from the University of Illinois at Urbana-Champaign. He has been a consultant to Sun Microsystems, National Semiconductor, and Mentor Garaphics. His current research interests include software testing, software engineering, and computer simulation. He is a member of IEEE and ACM.

Java