by Corey Klaasmeyer

Java grid computing

news
Mar 28, 200719 mins

An introduction to Java-based frameworks for grid and cluster computing

What is a grid? It’s more difficult to answer this question today than it was in 2002, because commercialization of the technology has resulted in many products and implementations that are labeled grids but do not really fit the definition. To give some historical background, grid computing emerged from academia with a formal definition in the late 1990s. In 1998, Ian Foster and Carl Kessellman defined computation grids in The Grid: Blueprint for a New Computing Infrastructure:

“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities.”

Since then, the definition has been generalized to include the sharing of resources that include data. In contrast, cluster computing evolved, rather than emerged, as a low-cost way of solving computationally demanding problems by using large numbers of commodity CPUs together with commodity networking technology.

Although grids and clusters may both satisfy high-performance or high-throughput requirements by enabling distributed computing, grids solve the more complicated problem of providing computing resources to groups that span organizational boundaries, where resources are spread among the groups. In fact, a grid may marshal numerous clusters from different organizations into a logical set of computational resources available to a group of authorized users. By definition, grid services must live in more a complex environment where resources must be shared and secured according to policies that may differ from organization to organization.

To clarify this point further, the next section provides some purpose and context for grids, as opposed to clusters, and introduces Ian Foster’s three criteria for grids, which must be met before something can be classified as a grid.

Grids and clusters: Purpose and context

As computing power increased and prices dropped over the last part of the 20th century, it became clear that if large numbers of low-cost computers could be used in concert, they could provide supercomputing power at a much lower cost than purpose-built, high-performance supercomputers. This isn’t entirely true anymore; my co-worker just purchased a consumer computer that, according to the government’s definition, qualifies as a supercomputer. Apple makes this computer, and it costs less than $3,000, including delivery.

As always in engineering, if you optimize one variable, in this case cost, you must offset that gain for a loss in some other aspect of the system. Compared with a supercomputer, clusters pay the price for this low cost in communication overhead and RAM size. Because processes must communicate with other processes via the network, rather than hardware on the motherboard, communication is much slower. Also, high-speed RAM availability is limited by the amount of memory available to hosts in the cluster. Given these constraints, clusters have still proven invaluable in high-performance computing for solving problems that can easily be broken into many smaller tasks and distributed to workers. Ideal problems require little communication between workers, and their work product can be combined or processed in some way after the tasks have been completed.

Grids can certainly solve these sorts of problems, but they might have a supercomputer available for tasks that cannot be broken up so easily. The grid would provide a way to match this supercomputer with your problem, reserve it, authenticate your task and authorize its use of the supercomputer. It would execute the task and provide a way to monitor progress on that supercomputer. When the supercomputer completes your task, it would send the results to you. This supercomputer might even be in a different hemisphere and owned by a different institution. Finally, the grid might even debit your account for using this service.

By way of contrast, a cluster might provide some of these services. It might even be a cluster of supercomputers, but the cluster would probably belong entirely to your institution, and it probably wouldn’t bill you. In addition, your institution probably would have a consistent policy and method of authenticating your credentials and authorizing your use of the cluster. More important, the cluster would probably exercise complete and centralized control over its resources.

Grids initially emerged to solve resource-sharing problems across academic and research institutions, where funding for researching a broad topic might be distributed across a variety of institutions that employed researchers focusing on particular aspects of that topic. Often, researchers need to share experimental data generated by expensive sensors. For example, the particle-colliders that provide experimental data for high-energy particle physics are extremely complex and expensive beasts. Modern colliders generate massive amounts of data that must be managed efficiently and replicated to move the data closer to computational resources. When complete, the Large Hadron Collider (LHC) at European Organization for Nuclear Research (CERN) will generate petabytes of data every year. Sharing this experimental data among many geographically distributed research organizations and researchers requires sophisticated resource-sharing technology that can expose those resources in an open, standard, and secure way. These requirements far exceed those of a cluster intended to provide high-performance computing for a given institution.

Ian Foster, widely regarded as the father of grid computing, provides three criteria that must be met in order for something to be categorized as a grid:

  1. Coordinates resources that are not subject to centralized control;
  2. Uses standard, open, general-purpose protocols and interfaces; and
  3. Delivers nontrivial qualities of service.

Of these three, in my mind, the first most clearly defines a grid; it defines a software requirement for enabling resource sharing that crosses organizations. The second makes more sense in the context of the grid, rather than a grid. If you can envision the grid as something akin to the Internet — pervasive, yet providing access to abstract computing resources — the necessity for the second makes more sense. Finally, “nontrivial” qualities of service (QOS) distinguishes a grid from a cluster. Although clusters typically guarantee some level of QOS, they are trivial in comparison to providing QOS that offer decentralized control of resources. Refer to “What Is the Grid?” for a complete explanation of these criteria and grid definitions as they have evolved over the last decade.

Java grids and clusters

Many grid and cluster frameworks can host services implemented in Java. In fact, numerous frameworks are implemented entirely in Java. While, the Globus Toolkit 4 (GT4) grid service featured in this article is implemented in Java, the framework is implemented both in Java and in C.

Because GT4 is the reference implementation of the Open Grid Services Architecture (OGSA) based on the Open Grid Services Infrastructure (OGSI), any grid tutorial on grid technology should start with a Globus grid service implementation. The next section describes the parallel algorithm that our example grid service will implement.

A Globus grid service algorithm

Algorithms can be made parallel by splitting processing into discrete pieces and distributing the work among many workers. Some algorithms lend themselves to this type of parallelization. For instance, it’s easier to make algorithms parallel if the results from one piece of work do not have to be fed into another piece of work. For example, searching to find all the primes in the first 1,000 integers can be split into 10 distinct sets of numbers: 1-100, 101-200, etc. Given 10 computers, each of these 10 sets of 100 could be sent to the individual computers, and the time required to find all of the primes would approximately scale with the number of processors. The communication overhead of transmitting 68 prime integer results out of the 1,000 factored is negligible. More important, each of the workers can process its piece of work in isolation from the other workers. In other words, the code for finding primes for 101-1,000 would not need any output from the processing of the first 100 in order to do its work.

In this article, we implement a grid service for testing special primes called Mersenne primes. Mersenne primes are primes of the form 2^p-1. Mersenne primes interest mathematicians because they represent the largest-known primes, and are perfect numbers.

To test for a Mersenne prime, our service will apply the Lucas-Lehmer test using Java’s BigInteger class. We must use the BigInteger class to search for new Mersenne primes, because current Mersenne primes significantly exceed the bounds of a long. The BigInteger class implement integers of unbounded size. The Lucas-Lehmer test provides an extremely simple and efficient way to test for primality. The following pseudo-code implements this test:

 mersenne = 2 ^ exponent - 1
  lucas = 4
  for(i = 3; i <= exponent; i++)
    lucas = (lucas ^ 2 - 2 ) % mersenne;
  if (lucas == 0) then 2 ^ exponent - 1 is a Mersenne prime

As of the writing of this article, the 44th and largest-known Mersenne prime was found by the Great Internet Mersenne Prime Search (GIMPS). The Electronic Frontier Foundation (EFF) is offering a $100,000 prize for the first Mersenne prime found with more than 10 million digits.

A Globus grid service implementation

To implement our grid service, you will need to download the GT4 Web Services-Core and the Globus Service Build Tools. As mentioned earlier, the Globus Toolkit is implemented in both Java and C. This core download includes just the Java implementation and a simple stand-alone container suitable for testing services.

The example grid service is paired down to its bare essentials; the goal here is to sketch the process of implementing a Globus grid service as simply as possible.

Define the Web service interface using WSDL

GT4 implements grid services as Web services. However, GT4 Web services conform to the Web Service Resource Framework (WSRF) specification. WSRF enhances vanilla Web services with “statefulness.” Because our service does not maintain any state, we won’t worry about the complexities associated with a stateful implementation. For more detailed information about GT4 grid services and how they are related to the WSRF specification, refer to Borja Sotomayor’s Globus Toolkit tutorial.

Without statefulness, the WSDL (Web Services Description Language) is much simpler. The following WSDL describes our stateless grid service:

 <?xml version="1.0" encoding="UTF-8"?>
<definitions name="MersennePrimeService"
 
targetNamespace="http://www.javaworld.com/namespaces/
  MersennePrimeService_instance"    
  xmlns="http://schemas.xmlsoap.org/wsdl/"

xmlns:tns="http://www.javaworld.com/namespaces/
  MersennePrimeService_instance"    
  xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <types>
    <xsd:schema targetNamespace=
    "http://www.javaworld.com/namespaces/MersennePrimeService_instance"

xmlns:tns="http://www.javaworld.com/namespaces/
  MersennePrimeService_instance"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    
      <!-- Input Parameters
-->  
      <b><xsd:element name="test" type="xsd:int"/></b>

      <!-- Output Parameters
-->
      <b><xsd:element name="testResponse" type="xsd:boolean"/></b>
        
    </xsd:schema>
  </types>

  <!-- Messages
-->
  <message name="TestInputMessage">
    <part name="parameters" element="tns:test"/>
  </message>
  <message name="TestOutputMessage">
    <part name="parameters" element="tns:testResponse"/>
  </message>

  <!-- Porttype
-->
  <portType name="MersennePrimePortType">
    <operation name="test">
      <input message="tns:TestInputMessage"/>
      <output message="tns:TestOutputMessage"/>
    </operation>
  </portType>

</definitions>

You might recognize the above WSDL as a relatively simple, standard Web service definition of an interface method with its input and output messages. The only part you need to understand is the request and response types in bold above. The element attribute with attribute name="test" defines the request type as a primitive int type. The response element with attribute name="testResponse" also defines the response type as a primitive int. To match the interface as defined in this WSDL file, we must implement a method in our grid service that matches this signature: public int test(int).

Implement the grid service in Java

Our grid service method matching the interface defined in the WSDL above implements the Mersenne prime test pseudo-code. As an argument, it takes an exponent to test for primality. If the exponent passes the Lucas-Lehmer test, then a Mersenne prime has been found of the form 2^exponent-1. In the extremely unlikely case it happens to be longer than 10,000,000 digits, go directly to EFF and collect your $100,000 prize.

Note that this article’s Mersenne prime search implementation is relatively inefficient. First, the method should search a range of exponents, rather than a single exponent. The high cost of using Java and Web services to remotely invoke this service should be amortized over the search of many exponents. Second, the algorithm isn’t as efficient when compared to more sophisticated algorithms that use Fast Fourier Transforms to quickly multiply large numbers.

The following code implements the Globus grid service:

 package prime.impl;

import java.math.BigInteger;

import org.globus.wsrf.Resource;

public class MersennePrimeService implements Resource {

  private static final BigInteger ZERO = new BigInteger("0");
  private static final BigInteger ONE = new BigInteger("1");
  private static final BigInteger TWO = new BigInteger("2");
  private static final BigInteger FOUR = new BigInteger("4");

  public boolean test(int exponent) {
    BigInteger mersenne = TWO.pow(exponent).subtract(ONE);
    BigInteger lucas = FOUR;

    // perform the Lucas-Lehmer test
    for (int i = 3; i <= exponent; i++) {
      lucas = lucas.multiply(lucas).subtract(TWO).mod(mersenne);
    }

    // if zero, this is a mersenne prime
    return (lucas.compareTo(ZERO) == 0);
  }

}

Notice that the service class implements an interface named Resource. This empty interface tags the service as a resource to be managed by the Globus Framework. Generally speaking, the entire purpose of the grid is to share resources in sophisticated ways. In other words, this service is a computational resource that will be shared on a Globus grid.

Write the Web service deployment descriptor and JNDI

The deployment descriptor, a file with a .wsdd extension defines a service element. The wsdlFile child element references a file written in WSDL that defines the service interface. Its className attribute defines the class implementing the service, and a child parameter defines the service name. This name will determine the service URI. The grid service reads this file, loads the service, and begins listening for requests on the path MersennePrimeService relative to the base path. By default, the standalone container we use to host this example uses http://localhost:8080/wsrf/services as a base path, so clients can access the MersennePrimeService service at the combined URI: http://localhost:8080/wsrf/services/examples/core/first/MathService

The following XML instance defines the service:

 <?xml version="1.0" encoding="UTF-8"?>
<deployment name="defaultServerConfig" 
  xmlns="http://xml.apache.org/axis/wsdd/" 
  xmlns:java="http://xml.apache.org/axis/wsdd/providers/java" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <service name="<b>MersennePrimeService</b>" provider="Handler" use=
   "literal" style="document">
    <parameter name="className" 
     value="<b>prime.impl.MersennePrimeService</b>"/>

<wsdlFile><b>share/schema/prime/
 MersennePrime_service.wsdl</b></wsdlFile>
    <parameter name="allowedMethods" value="*"/>
    <parameter name="handlerClass" 
     value="org.globus.axis.providers.RPCProvider"/>
    <parameter name="scope" value="Application"/>
    <parameter name="providers" value="GetRPProvider"/>
    <parameter name="loadOnStartup" value="true"/>
  </service>

</deployment>

The following XML instance populates the JNDI (Java Naming and Directory Interface) context with a Globus factory class used to instantiate the service:

 <?xml version="1.0" encoding="UTF-8"?>
<jndiConfig xmlns="http://wsrf.globus.org/jndi/config">

<service name="<b>MersennePrimeService</b>">
  <resource name="home" type="org.globus.wsrf.impl.ServiceResourceHome">
    <resourceParams>
      <parameter>
        <name>factory</name>
        <value>org.globus.wsrf.jndi.BeanFactory</value>
      </parameter>
    </resourceParams>
  </resource>
</service>
</jndiConfig>

The service name in bold above must match the service name in the WSDL file.

Create the Globus Archive (GAR) file

With the WSDL, the Java grid service implementation, and the WSDD and JNDI deployment files in place, we have fully defined and configured the grid service. However, we must still compile the implementation, and generate and compile the Web service stub source. To do this, we use the build.xml Ant script in the Globus Build Services Package you downloaded earlier. This package contains many scripts, including a shell script for Unix and a Python script for Windows. Essentially, these scripts pass properties to the Ant build.

Instead of using these scripts, however, we will use a build.properties file defining the build’s properties. This file is included in the downloaded source’s root directory; however, you must expand the Globus Build Services Package and copy the build.xml from its root directory to the service’s root directory. A packaged Globus grid service is appended with the .gar extension. You create the gar file by invoking Ant with from the command line: /service> ant .

But first, you need to set a couple of environment variables. If you are running on Windows, execute the following commands, replacing the paths with the appropriate locations on your file system:

 set ANT_HOME=c:apache-ant-1.7.0
  set GLOBUS_LOCATION=c:ws-core-4.0.4
  set PATH=%ANT_HOME%bin;%GLOBUS_LOCATION%bin;%PATH%

On Unix, execute these commands:

 export ANT_HOME=/apache-ant-1.7.0
  export GLOBUS_LOCATION=/ws-core-4.0.4
  export PATH=$ANT_HOMEbin;$GLOBUS_LOCATIONbin;$PATH

If everything goes well, you should see a file in the service’s root directory called mersenne.gar. When you deploy this gar file in the next step, it will expand into a standard directory visible to the standalone grid service container.

Deploy the service

Now deploy the gar file we created in the last step by executing the following command: /service> globus-deploy-gar mersenne.gar .

Congratulations! You have deployed a Globus grid service. To test some primes, you need to implement a client to query the service and send it an exponent to test.

Implement the client

The client code needed to invoke the grid service looks more complicated than it really is, because the Web service infrastructure is invasive. (In fact, we’ve had to do quite a bit of work up to this point, and most of it we don’t really care about. Rather, it’s work that the container, or the framework, cares about.) The client application takes two parameters: a service URI, and an exponent. The client passes the exponent to the grid service, gets the response, and prints whether it passes the primality test:

 package prime.impl;

import java.rmi.RemoteException;

import javax.xml.rpc.ServiceException;

import org.apache.axis.message.addressing.Address;
import org.apache.axis.message.addressing.EndpointReferenceType;
import org.apache.axis.types.URI.MalformedURIException;

import com.javaworld.www.namespaces
 .MersennePrimeService_instance.MersennePrimePortType;
import com.javaworld.www.namespaces.MersennePrimeService_instance
 .service.MersennePrimeServiceAddressingLocator;

public class MersennePrimeClient {

  public static void main(String[] args) {
    MersennePrimeServiceAddressingLocator locator = new MersennePrimeServiceAddressingLocator();

    int status = 0;
    if (args.length != 2 || args[0] == null || args[1] == null) {
      System.out
          .println("You must enter the service URI and a exponent to 
           test for primality.");
      status = 1;
    } else {

      int exponent = -1;
      String serviceURI = args[0];
      try {
        exponent = Integer.parseInt(args[1]);

        EndpointReferenceType endpoint = new EndpointReferenceType();

        endpoint.setAddress(new Address(serviceURI));
        MersennePrimePortType mersennePrimePortTypePort = locator
            .getMersennePrimePortTypePort(endpoint);

        if (mersennePrimePortTypePort.test(exponent)) {
          System.out.println("2^" + exponent + "-1 is a Mersenne prime!");
        } else {
          System.out.println("2^" + exponent + "-1 is not a Mersenne prime.");
        }
      } catch (NumberFormatException e) {
        System.err.println("Error parsing end [" + exponent + "].");
      } catch (MalformedURIException e) {
        System.err.println("Error parsing [" + serviceURI + "].");
        e.printStackTrace();
      } catch (RemoteException e) {
        System.err
            .println("Could not make a remote connection to the Grid 
             Service at URI ["
                + serviceURI + "].");
        e.printStackTrace();
      } catch (ServiceException e) {
        System.err.println("The Grid Service at URI [" + serviceURI
            + "] threw an Exception.");
        e.printStackTrace();
      }
    }
    System.exit(status);
  }
}

Source the following script to create a CLASSPATH environment variable with all of the Globus dependencies: source $GLOBUS_LOCATION/etc/globus-devel-env.sh .

Or, on Windows: $GLOBUS_LOCATION/etc/globus-devel-env.bat .

Now, you can compile the service using javac: /service> javac -classpath ./build/stubs/classes/:$CLASSPATH -sourcepath src src/prime/impl/MersennePrimeClient.java

Once the container is started, it will return a list of URIs for all its hosted services. You should see the MersennePrimeService somewhere in this list:

 /service> globus-start-container -nosec

[1]: http://192.168.10.6:8080/wsrf/services/AdminService
[2]: http://192.168.10.6:8080/wsrf/services/AuthzCalloutTestService
[3]: http://192.168.10.6:8080/wsrf/services/ContainerRegistryEntryService
[4]: http://192.168.10.6:8080/wsrf/services/ContainerRegistryService
[5]: http://192.168.10.6:8080/wsrf/services/CounterService
[6]: http://192.168.10.6:8080/wsrf/services/ManagementService
[7]: http://192.168.10.6:8080/wsrf/services/MersennePrimeService
[8]: http://192.168.10.6:8080/wsrf/services/NotificationConsumerFactoryService
[9]: http://192.168.10.6:8080/wsrf/services/NotificationConsumerService
[10]: http://192.168.10.6:8080/wsrf/services/NotificationTestService
[11]: http://192.168.10.6:8080/wsrf/services/PersistenceTestSubscriptionManager
[12]: http://192.168.10.6:8080/wsrf/services/SampleAuthzService
[13]: http://192.168.10.6:8080/wsrf/services/SecureCounterService
[14]: http://192.168.10.6:8080/wsrf/services/SecurityTestService
[15]: http://192.168.10.6:8080/wsrf/services/ShutdownService
[16]: http://192.168.10.6:8080/wsrf/services/SubscriptionManagerService
[17]: http://192.168.10.6:8080/wsrf/services/TestAuthzService
[18]: http://192.168.10.6:8080/wsrf/services/TestRPCService
[19]: http://192.168.10.6:8080/wsrf/services/TestService
[20]: http://192.168.10.6:8080/wsrf/services/TestServiceRequest
[21]: http://192.168.10.6:8080/wsrf/services/TestServiceWrongWSDL
[22]: http://192.168.10.6:8080/wsrf/services/Version
[23]: http://192.168.10.6:8080/wsrf/services/WidgetNotificationService
[24]: http://192.168.10.6:8080/wsrf/services/WidgetService
[25]: http://192.168.10.6:8080/wsrf/services/gsi/AuthenticationService

Invoke the client application with a classpath argument including build/class and the CLASSPATH environment variable. The globus-devel-env.sh/.bat script defined this CLASSPATH variable with a list of all the necessary Globus dependencies. You will pass two arguments to the application. The first argument specifies the service URI, and the second specifies the exponent to test. For example, to test the exponent 3 of Mersenne number 5:

 /service> java -cp build/classes:$CLASSPATH prime.impl.MersennePrimeClient 
   http://192.168.10.6:8080/wsrf/services/MersennePrimeService 3
  2^3-1 is a Mersenne prime!

If you test 4, the client returns a different message:

 /service> java -cp build/classes:$CLASSPATH prime.impl.MersennePrimeClient 
 http://192.168.10.6:8080/wsrf/services/MersennePrimeService 3
  2^4-1 is not Mersenne prime!

Conclusion

After reading this article, you should be able to cut through some of the marketing hype when someone tells you that their product is a grid or is grid-enabled. Grids coordinate decentralized resources. Grids communicate with open protocols and interfaces. Grids deliver nontrivial qualities of service.

You might have figured out by now that you don’t really need or want grid technologies. True grids suffer from having to cope with an extremely complex distributed environment. More commonly, a computationally complex problem might require aggregated computational resources to arrive at a solution faster. Or, the application must process many transactions per second so the load must be distributed.

If you don’t need to share these resources or outsource the processing to some other organization, a cluster will probably meet your needs more simply and easily.

Corey Klaasmeyer received a dual-degree in physics and English from Colorado College. He is currently part of a team building a predictive-modeling platform at Valen Technologies. He began using Java in 1995 and a short time later founded the Denver Java Users Group (DJUG). Since that time he has written software for applications serving automotive, banking, telecommunications, insurance and financial domains.