Peter Wayner
Contributing Writer

Java in the cloud: Google, Aptana, and Stax

reviews
Apr 21, 200921 mins

Google App Engine, Aptana Cloud, and Stax for EC2 make it easy to spin up and scale a simple Java Servlet Container, but are still a far cry from the full Java EE

Just as the megastars in Hollywood seem to find each other and fall in love, it was only inevitable that two of the greatest buzzwords ever hatched — “Java” and “cloud” — would meet and begin to breed. Now that a number of companies have launched Java clouds, or begun weaving Java into their hosted development platforms, the race is on to remake the Java infrastructure in the cloud image.

There is some irony in this turn of events because the Java infrastructure has done better than most piles of code in solving the difficult problem of getting multiple processors and multiple machines working together. Java EE (Enterprise Edition) offers a very sophisticated set of mechanisms that pass messages between machines (Java Message Service or javax.jms.*) and handle database access (Java Persistence and Java Transaction). Then there’s the Enterprise Java Bean, a sophisticated tool for managing persistence on a cluster, an abstraction that’s so powerful and so dangerous that it has driven as many programmers mad as it has helped.

[ Is the mainframe the ultimate cloud platform? What should you do if your cloud provider disappears? What does cloud computing really mean? ]

A number of companies have repackaged the JVM (Java Virtual Machine) and turned it into a hosted service. To see how this is working out, I set up accounts at three different providers offering Java services on their cloud, built a few test applications, and bombed them with some HTTP requests.

All of them are very new. Google’s App Engine just expanded to include Java and is now giving select programmers an “early look.” Stax is in beta. Aptana’s Cloud doesn’t use either term but is adding new features. Surprisingly, Sun was not ready to let me test anything in its cloud but is expected to launch in a few months. (See the sidebar, “Sun Cloud looks beyond Java,” for a description of what Sun is planning.)

The most surprising element about all of these new clouds is how little they offer compared to the promise of the Java EE stack. At the core, they provide a simple servlet container, one that’s stripped down and not much different from Tomcat because it is often just Tomcat. The tools do a better job of delivering a revolutionary way of purchasing computer time than they do of creating the next generation of Java flexibility.

 

Java clouds at a glance

 
 

Pros

Cons

Bottom line

Google App Engine

  • Automatically scales up and down without asking permission
  • Free trial layer for small applications
  • A big set of tools integrated with Eclipse and GWT
  • You must store data in the way Google wants you to store data
  • Some class files are absent
  • Every step is bound by strict quotas

Great for lightweight shells that wrap Java code around big tables

Aptana Cloud

  • Deep integration with Aptana’s nice set of Eclipse plug-ins
  • All of the familiar buttons are available including root
  • All of the familiar databases are waiting for you on their regular ports
  • Pricing options are narrow (you get only four choices)
  • No clusters yet

All of your server controls are now available from Eclipse

Stax for EC2

  • Web-based tool lets you choose your development platform
  • A wide range of clustering options available
  • A good collection of startup frameworks
  • All of Amazon’s service offerings are in the same datacenter
  • You need to manually switch to bigger clusters when the hordes arrive
  • Limited access to resources under the hood

A simple Web-based tool makes it easy to whip up a Tomcat server

But this may be because the creators have a slightly different goal than the creators of the original J2EE. They’re not trying to create a wonderful cloud of objects that float from machine to machine, nibbling on a few cycles here, chomping on a large block of memory there. They’re really just tackling the headaches of deploying a server, a process that can be maddening in many IT shops. They want to make it easy for a project to turn into a public Web site and then grow adequately if thousands or millions decide they want to tune in. The goal is to make all of this happen as automatically as possible without all of the headaches of approving purchase orders, reserving rack space, waiting for deliveries, and other time-consuming problems.

Some of the simplicity must also be because this is all very new. I wouldn’t be surprised if the companies begin integrating all of the more sophisticated layers in their next generation. They’re starting with Tomcat for now, and it shouldn’t be too hard to catch up with Java EE.

[ See previous Test Center cloud reviews: Cloud versus cloud: A guided tour of Amazon, Google, AppNexus, and GoGrid | Inside Amazon Web Services | Windows Azure Services Platform gives wings to .Net ]

There is a great deal of variety in the approaches and the levels of abstraction. Google’s App Engine caters to a thinner, more widely parallel set of applications that can scale automatically. Aptana’s tool, on the other hand, is a nice IDE that integrates deployment and purchasing into Eclipse. Stax offers something that lies roughly in between both tools.

Google App Engine

Google’s shining new Java wing to the App Engine should be very familiar to anyone who’s spent time with the first generation based on Python because many parts of the architecture are unchanged. You write a thin layer of logic that juggles the requests and then you rely upon the back end to synchronize everything. The Java applications use the same database, image processing engine, and mailer in pretty much the same way as their Python-based cousins.

While the new Java tool will be very familiar to Python programmers who used the original engine, many of the ideas will be a bit strange and new to Java programmers. The database is not MySQL, Oracle, or even the embedded database included with the JVM, Derby. It’s a proprietary data store with a small subset of SQL called GQL. You can’t use JDBC (Java Database Connectivity) to link up with it; you need to use Google’s own proprietary layer.

This is just the beginning of the changes. You can’t just open up a socket and suck down a Web page; you have to use the URL fetching code. If you want to keep a cache of commonly used information, you should store your objects with the Memcache implementation that Google offers. Google’s code will keep everything consistent so that the Memcache on all machines will offer the same thing when the synchronization is finished.

There are also a number of restrictions on the classes you can use. Google’s version of the JVM isn’t fully stocked. You won’t be able to spin off a thread to handle processing and you can’t write to the disk — ever. You have to use Google’s data store if you want to save the data.

All of these changes have some advantages. Google’s data store is stripped down and optimized for working with many machines at once. The Memcache service saves calls to the data store — something to be wary about because the meter is clicking whenever your servlet is churning. The image processing tool handles some of the work with native code, another advantage.

For all of these reasons, I think App Engine will be most attractive to projects that need to give people shared access to several big tables filled with data. It’s not really for all Java programmers, but for people who are familiar with Java and like to use it to write some glue code to wrap around a big table. You can’t do too much to the data on the way in or the way out because there are limits on the amount of time that each request can spin the meter.

I think these restrictions will mean that there will be relatively few applications that just pick up and migrate to the App Engine. All of the data access will need to be rewritten and some of the common tricks that use flat files will need to be re-engineered. Moving your application out of the App Engine will probably be a bit easier, but it will require changing your mindset because the App Engine used to handle some of the scaling and synchronization issues for you. It may be technically possible to run the App Engine debugging environment on your own server, but the Terms of Service say Google is giving you a license for the “sole purpose of enabling you to use and enjoy the benefit of the Service as provided by Google, in the manner permitted by the Terms.”

Google is well aware of this issue and is trying to address it as it encourages people to use the system. IBM is even offering tips on how to migrate App Engine code to its platform. It’s just a matter of getting the JDO (Java Data Objects) calls to talk with IBM’s DB2 instead of Google’s back end. I’m guessing that IBM hopes to grab customers who build the first rev in App Engine and then decide that some threading or slow cron jobs are absolutely necessary.

I built several JSPs that deliberately sucked down a large amount of computation time, and it was pretty easy to push them hard enough to reach the limit. I don’t think most Web 2.0 applications will run up against Google’s CPU and memory limits, which are liberal from the standpoint of the typical Web application, but they could be a problem for anyone that wants to do much heavy processing. The image toolkit, for instance, will only work with images smaller than 1MB — something that’s a bit tight for serious photographers. My pocket camera, for instance, can turn out images that take up 4MB.

These limits might squeeze an application in unexpected ways. cron jobs are just URLs that are called at set times. That’s a nice abstraction but it’s definitely a bad fit for some of the massive reports that corporations generate every evening. It’s more for housekeeping than any kind of asynchronous heavy lifting.

There are real advantages to what may at first seem like a straightjacket to any programmer who grew up opening sockets on a whim and writing to the file system whenever it felt good. The explicit limitations help architects create better applications that run more smoothly because they prevent overreaching the limits of the system. Many of the early adopters of the Java EE found themselves pulling out their hair when one of the automatic tools would take forever to deliver the magic that the API documentation promised. Making the limitations of the architecture apparent by writing a tightly limited API is more of a gift than a curse.

If there are no joins in the data store, then it will be easier to generate massive reports because the database table will be denormalized from its inception. If the jobs can’t run that long, the architect can make living documents that let the user drill down to generate the necessary information on demand. That can be much more efficient than spending the entire night pre-computing something that won’t be read by many people.

It’s worth noting that Google has done a nice job of integrating the system with Eclipse. There is a wide variety of tools, and they do more than just upload WAR (Web archive) files to the App Engine. The standard application shell is integrated with Google Web Toolkit, the mind-bending tool that converts your Java code into JavaScript that runs on the client. The dashboard is simple but responsive. The spikes I generated in my jobs started showing up within seconds.

All of this adds up to a compelling tool for serious experimentation, the kind of monkeying around with the hope that it will turn quickly into some that’s worth launching a hundred servers. The App Engine will scale up quickly and then stop on a dime as it follows the ebbs and flows of fortune’s fickle whim automatically.

Aptana Cloud

Aptana made its name by creating a nice set of plug-ins that sit on top of Eclipse and make it simpler to develop Java, JavaScript, PHP, Ruby, and Python applications. Aptana Studio is a nice solution for many developers who want to work with all of the dominant Web programming languages, especially AJAX. Now the company is expanding this set of plug-ins in a partnership with Joyent hosting to produce Aptana Cloud.

Aptana Cloud, like the Studio, is a set of Eclipse plug-ins that smooth the deployment process to Joyent’s collection of servers. In one tab of Eclipse you edit your code, and in another you control how it’s deployed to the server running Tomcat, MySQL, and PostgreSQL. You can also build out Web sites with Rails, Jaxer, and PHP. Python is said to be coming.

The “My Cloud” tab is pretty much a fancy front end to the standard Linux server. In one tab, you can turn the server daemons (Tomcat, MySQL, PostgreSQL, and Apache) on or off. If you want to add more computational resources, you can switch to another tab where the options let you choose one of four settings for disk space and RAM. The basic introduction setting includes 256MB of RAM and 5GB of disk space, billed at $0.027 per hour, a price that works out to $20 per month. If you want more, you can move a little lever that goes up to 2GB of RAM and 25GB of disk space for $0.359 per hour, or about $267 per month.

The service is just a pretty face on many of the standard VPS (virtual private server) tools out there, something that will be comforting and familiar to anyone with hard-won experience wrestling with the standard offerings. Access to the database is available on port 3306. Secure FTP and Subversion are also ready and running. If you want root, it’s yours with a click. All of the log files are nicely presented in yet another tab. The system load and memory consumption appear in a dashboard-like tab. You never need to leave Eclipse/Aptana Studio.

While you’re technically running on a single virtual machine, the servers have eight CPUs and they’re set up to allow bursts of computation that can consume over 95 percent of those eight processors. This is more a nice feature that smooths bumps for occasional busy periods, not a way to get eight CPUs on the cheap.

Aptana Cloud is less revolutionary and more evolutionary. You can use all of the experience you have with the traditional tools again here. The buttons map pretty cleanly to the tasks that used to require Emacs in the shell and just simplify the process. If you need to poke around under the covers, or set up some other workflow, the opportunity is here.

Stax on Amazon EC2

While the other two solutions come built as plug-ins to Eclipse, Stax Networks offers a complete set of Web tools for creating and managing the projects. Everything starts with the Web. Then you download the source to your local computer for editing by invoking a command-line tool that handles downloads and redeployment. Then it’s back to the Web for all of the management. I’m guessing you could switch from a single machine to a big, five-server cluster with just a few clicks at one of those computers they park in a hotel lobby to let you check in for your flight.

A wide variety of starting points is available from basic servlets to Apache Struts or Apache Wicket, all running on Tomcat. Stax also offers JRuby and Jython running on top of the same Java foundation. All can talk to MySQL databases running in the same cluster.

At this point in time, you download the code, build your application on your machine, and redeploy code with a command-line tool. This flexibility lets you use Ant, Eclipse, or any other Java tool to build the application. I wouldn’t be surprised if Stax doesn’t eventually integrate some of the wiki-like features from other Web sites into its tool and make it possible to do much of the building in your browser.

The Stax cloud, by the way, is just a subcloud of Amazon’s EC2. Or maybe it’s a virtual cloud. In any case, Stax Networks rents from Amazon and then turns around and rents to you. This means that you might want to use some of Amazon’s other services, like S3, because they don’t require a long trip across the Internet. They’re all in the same server farm.

The Stax software doesn’t offer up as much low-level access as Aptana, but it offers a greater variety of computational resources. On one hand, you can’t just push a button to get root access to your server. There’s no Secure FTP or other configurations available. On the other hand, you can move your application to a cluster with up to five servers with just a click of the button.

The simplicity of the Stax software made it easier to get moving. I was able to start up a simple Wicket application and move it to a cluster of two machines in just a few minutes. As I pounded on it, I could watch the graphs show how the CPU and memory spiked as the HTTP requests piled up. It was pretty simple to push the servers to the edge.

Stax offers the full JVM, but there are some minor hiccups. You can pretty much do anything you would normally do with Tomcat, such as start up your own threads, but you’ve got to be aware that some of the resources, like the file system, might be a bit hidden. The applications can write directly to a sandboxed corner of the file system, but this data is not synchronized between machines. If you’re only going to use one server, you’ll be fine. The simplest way to handle the synchronized data is to write to Amazon’s S3 because it’s in the same datacenter. I suppose you might also handle the synchronization yourself by setting up JMS because when it comes down to it, Stax is just Tomcat under the hood.

The Stax cloud is still officially in a free beta. The JVMs all run in 256MB of memory, but this will be another dial that you can choose in the future. The prices haven’t been set, but it only makes sense that it will be a bit more expensive than Amazon’s cloud unless Stax Networks is able to swing some bulk pricing.

Openness or scalability

Which solution should you choose? Much depends upon the nature of your application. If your data falls neatly into columns, and not much computation needs to take place when you save or recall it, Google’s App Engine is a nice choice. Google offers a free tier of service that makes it great for prototyping solutions that can turn into full-fledged applications without any deployment hassles.

Google’s solution changes the failure modes of running an operation. If your application finds wild success on the Internet, you don’t need to frantically try to purchase new servers that won’t arrive until the fad is a distant memory. But you’ll need to bump up the daily quota on your account because Google will only keep your code running as long as you authorize the spending.

There are hefty tradeoffs to choosing Google’s easy chair. If you want to use all of the standard APIs, write to disk, log into a shell account, or just enjoy the freedom to move your application to another provider without rewriting it, you’ll need to look elsewhere. Taking advantage of Google’s scaling prowess means writing to its tightly restricted APIs.

Both Aptana and Stax offer more standard solutions that can easily be duplicated because they’re just Tomcat and a database under the hood. There’s much less lock-in with their tools because you can pretty much take your WAR file to any other server farm. You’ll have to handle all of the deployment issues yourself, but it’s feasible.

Aptana might be more useful to someone writing applications that will run on one server. It’s a great tool for prototyping new systems and getting them on the Internet quickly.

Stax offers more room to scale because it deploys the application to multiple servers and load balancers with just one click. I think it offers a nice mixture of the scalability of Google with the openness of Aptana.

It’s worth noting that some applications aren’t well supported by any of these choices. These three are not great solutions for jobs that require bursts of heavy computation like, for instance, geologists prospecting for oil with big numerical processing simulations that churn through terabytes of data. Even though these applications are often highly parallel, they aren’t great matches for any of these services. Stax is probably the best choice because it lets you click on a button to launch your application on five computers, but it’s still intended for Web servers and five is only five. The ideal solution for these heavy computational jobs would let you start up thousands of machines for just an hour.

Java cloud futures

These clouds may grow to take on roles like this in the future, but for now I think they’re still working through many of the accounting challenges of starting up and shutting down machines so quickly. These companies can’t be certain of the best ways to plan ahead or to price their services. Some commerce sites say that people like to shop online during their lunch break. Video sites must be pounded during prime time. Balancing these loads must be a real challenge for these back ends.

The pricing is a bit hard to compare. Stax is still in beta and it hasn’t announced prices. Aptana has just four settings that tie everything together. The amount of disk space you get is tied to the amount of memory you allocate. One hour of CPU time can vary between $0.027 and $0.359. You pay this whether your machine is doing something or not.

Google, on the other hand, breaks up the bill and charges for bandwidth, stored data, e-mail, and CPU time. The price of $0.10 for the CPU seems more expensive because the cost is computed per thread. I was easily able to handle four or five requests at the same time with the cheapest setting on Aptana Cloud. But this doesn’t mean that Aptana’s cloud is wildly cheaper, because Google’s cloud doesn’t bill you if there are no requests coming in. It’s very difficult to compare these offerings, and I’m sure that different applications will run up very different bills on each service.

These vendors’ approaches are bound to evolve in the future as the companies try to figure out the right price and the real cost of offering these services. It may turn out, for instance, that the clouds need so many extra servers at the peak times that they need to charge enough to cover their costs during the down time. Or maybe there will be enough users at different times to spread out the demand. While I think Aptana’s prices are pretty much similar to the standard prices for VPS on shared servers, Google’s choices are less solid because they depend upon predictions of what people will do.

No one is certain how this model will evolve, but the future of cloud systems like these depends heavily on the decisions that everyone makes. A company with its own dedicated servers in its own datacenter can be a lone wolf, but anyone who chooses a cloud must adapt to working together. Although all of these systems erect firewalls that separate the applications from one another, there will be secondary effects. If you end up on a cluster with a company that buys a Super Bowl ad, everyone in the neighborhood is affected for a few minutes. If demand for a certain service isn’t high enough to pay the rent, the host is going to need to boost rates.

These are just some of the questions that emerge as the words “Java” and “cloud” are brought together. While the new offerings are ostensibly built for Java, they open the door to many other languages because some of the more popular ones today run on top of Java. Some Ruby programmers, for instance, switch over to JRuby when they feel that Java’s threading model offers better support. Jython is another favorite way to run Python with all of the robustness that can be borrowed from the JVM.

Java itself is evolving into something more than just a heavily typed, punctuation-happy language. Groovy looks and behaves more like a scripting language, but links together standard Java code and runs in the JVM. While Groovy applications will run in generic Java EE containers, Google explicitly offers support for the scripting language. The word “Java” may be the biggest part of these cloud offerings, but it opens the door to much more.

Peter Wayner

Peter Wayner is a contributing writer to InfoWorld. He has written extensively about programming languages (including Java, JavaScript, SQL, WebAssembly, and experimental languages), databases (SQL and NoSQL), cloud computing, cloud-native computing, artificial intelligence, open-source software, prompt engineering, programming habits (both good and bad), and countless other topics of keen interest to software developers. Peter also has written for mainstream publications including The New York Times and Wired, and he is the author of more than 20 books, mainly on technology. His work on mimic functions, a camouflaging technique for encoding data so that it takes on the statistical characteristics of other information (an example of steganography), was the basis of his book, Disappearing Cryptography. Peter’s book Free for All covered the cultural, legal, political, and technical roots of the open-source movement. His book Translucent Databases offered practical techniques for scrambling data so that it is inscrutable but still available to make important decisions. This included some of the first homomorphic encryption. In his book Digital Cash, Peter illustrates how techniques like a blockchain can be used establish an efficient digital economy. And in Policing Online Games, Peter lays out the philosophical and mathematical foundations for building a strong, safe, and cheater-free virtual world.

More from this author