The REST architectural style in the Semantic Web From an IT manager’s perspective, REST might just be another way of moving information, but without all the tools associated with SOAP. In this final article in the REST for Java developers series, Brian Sletten takes on that myth, and several others about REST, by looking at its role in emerging architectures such as the Semantic Web. The move to REST, he says, is not a move away from SOAP, but an embracing of the Web in its entirety, both inside and outside of the enterprise. So far in the REST for Java developers series you’ve learned hands-on about the RESTful way of doing things. If you’re reading this article it’s safe to assume you’re interested, even excited, about applying REST in your Java-based development. But what about other developers and managers in your shop? If you want to start building RESTful interfaces at work, you need to be able to explain why REST is the basis of future information systems, and why it’s worth adopting now. In this last article in the series, I discuss REST concepts as the foundation for dynamic, scalable, resource-oriented architectures. I’ll demonstrates how REST’s goals and vision align with larger plans for the Web itself. I’ll also explain how concepts like HATEOS, resource description and discovery, and linked data are already at work in projects such as the Linking Open Data project and how they apply to enterprise data integration. Finally, I’ll discuss how this overall approach can strengthen your security profile, and leave you with some ideas for improving the longevity of your RESTful interfaces. Information resources As you know from the first article in this series, Roy Fielding defined REST to describe the properties that emerge from the deliberate application of certain architectural constraints. The REST architectural style is built on the idea of a layered, stateless client-server interaction. RESTful interactions occur through uniform interfaces with support for caching and optional extension by code-on-demand, as shown in Figure 1. Figure 1. REST’s constraints combine to provide a flexible, scalable architecture. (Click to enlarge.) As a further reminder of what REST is, here’s what it is not: A means for invoking arbitrary behavior through URLsA drop-in replacement for SOAPEasyA toy REST, then, gives you a relatively simple means of interacting with arbitrary addressable resources. Those resources can signify your business data, concepts, back-end services, and policies — anything, really. The logical connection gives you the ability to pass around references rather than data, allowing heterogeneous systems, applications, and users to consume the data differently when and where they need to. Late-binding content negotiation is a powerful concept that you see in any real RESTful system. An environment like NetKernel takes late-binding content negotiation a step further with the idea of an executable infrastructure to support the transformation of resources. Recall that NetKernel serves up information resources in various flavors called aspects. One of its most useful concepts is the ability to convert between these forms declaratively. You have an XML document as a DOM aspect, but what if you want it as a node set to send off for XQuery processing? NetKernel will do this on the fly via a process called transreption (transforming representation). REST for Java developers A four-part series authored by Brian Sletten: Part 1: The REST architectural stylePart 2: The Restlet APIPart 3: NetKernelPart 4: The future is RESTful This notion, applicable to REST systems in general, frees you from the need to spend years defining common schemas that are always out-of-date and that suffer from what I call the Tyranny of the Common Form. The social costs of getting everyone to agree to the definition of these central models always outweigh the technical costs of implementing them. The process of normalizing to a common form often forces modelers to drop edge-case information. So, not only are these efforts expensive, but they are also often unsuccessful and sometimes even lossy. Being able to convert data from an XML representation to a rendered representation may seem like magic, but it’s not. You experience this every day on the Web. The trick is to take the next step and think about your information more generally. The fact that the same interaction style gives you late-binding, content-negotiated freedom to build flexible systems that scale and can be migrated without breaking clients is icing on the cake. REST is not of interest because it is simpler and easier than SOAP: REST is of interest because it solves real business and technical problems that have plagued the IT industry for years. RESTful URLs A URL to a resource such as http://server/order/1234 provides a name for some information — in this case, an order. We have had order IDs for as long as we have had order-processing systems. What is different about the Web and what is different about REST is that this identifier is unique, global, and resolvable. Do you want to look at what was in this order? Well, ask for it! This kind of name not only allows us to distinguish one order from another; it is also (as you’ll see in the next section) the interface to that resource. RESTful URLs do not have to refer just to raw data in databases, however. You can also create new URLs to refer to higher, layered processing, and business-specific concepts. How about open orders? The biggest orders of the past year? Every order from a particular customer? Orders that failed because your company could not meet the service-level agreement for particular customers? These concepts are all tremendously useful to a business, and being able to name and ask for this information directly is a powerful, efficient, and liberating process. Learning the URL schemes does not need to be onerous. By following some general guidance and understanding the term HATEOS, you can make it easy to browse for information. And by applying a reasonable metadata standard to the description of your resources, you can make the discovery process easier over even very large URL spaces. HATEOS One of the least-understood parts of REST and Fielding’s thesis is the thinking behind the phrase hypertext as the engine of state transfer (HATEOS). The breadth and depth of this misunderstanding provoked Fielding to issue a recent series of justified but cranky blog entries reminding people about this point. HATEOS can be a fairly nuanced concept, but its direct consequences are pretty straightforward. Consider the Web again. The dominant means of experiencing it is through a browser. You type in a URL to a site, and the browser issues a request via a specific protocol (usually HTTP) for the resource. At that point, the resource representation (usually an HTML document) is transferred back to your application along with metadata that indicates to the browser how to interpret it. Embedded in the result are links to one or more other pages, resources, images, and so on. Based on the content type (application/html), the browser understands how to parse it. It can find these embedded references and allow you to invoke them transparently by clicking through. Each new result is requested, interpreted, and rendered. As you probably well know, this simple process can engage you for hours! The point of this thought experiment is to remember what HATEOS means in practice. In short: REST is not equivalent to HTTP even though HTTP is the dominant application protocol used. REST works with other schemes as well.The uniform interface is the interface. Because of its constraints, REST does not have the same need as other approaches for service-description languages. The URL for a resource is how you interact with it using a semantically constrained set of verbs. Approaches like the Web Application Description Language (WADL) are unnecessary and confuse the issue. REST is not about arbitrary behavior; it is about manipulating information resources.Resources — even subresources — should be linkable. Do not return just blobs of data or collections of “dead” data. Collections of links are perfectly reasonable results that your client should understand.You can indicate processing hints to the client by tagging resources with appropriate content types. A client knows how to parse HTML, render an image, or play a sound file. Nonbrowser clients or sufficiently available plug-in technologies allow the whole system to be extended. Content negotiation allows the same resource to be requested in different formats as needed. You can define your own content types using the MIME approach to indicate how clients should interpret your data if standard types don’t apply.Resource description and discovery It is easy to imagine wanting to describe resources beyond simple MIME types, however. The line between the Web and the Semantic Web becomes blurry (and rightly so) when you start moving into metadata descriptions. There is a tremendous amount more to the Semantic Web than metadata, but that is how people usually come to it. What kind of metadata would you want for resources? What wouldn’t you like to know? Who created it? When was it created? For what purpose is it allowed? Intended? What is it about conceptually? Where can you find more information about it? Is it associated with a particular geographical location? SOAP-based Web services have Universal Description Discovery & Integration (UDDI) to describe and query metadata about them, but that is a broken technology that’s painful to employ (and few organizations actually do). The W3C’s Resource Description Framework (RDF) is lighter weight and reusable, and it supports the open world assumption — that is, anyone can say anything about anything. There is never a finished state so Semantic Web applications must always accept new facts. Many existing metadata and reasoning systems would lock the universe of discourse down with the closed world assumption: anything not specified is not valid or true. This assumption makes it easier to build reasoners about these systems but is not useful in a vibrant, global, and constantly evolving environment like the Web. RDF provides the substrate on which most other Semantic Web technologies are built. In general, you express relationships about URI-addressable resources (a perfect fit for describing RESTful service metadata!). The key point to remember is that what makes all of this different is the Web, not the semantics. There have been knowledge systems for years, but being able to express arbitrary relationships among linked information resources in an extensible way is the real game changer. There are several standard RDF vocabularies, but you are also free to write your own to describe whatever topic you would like. Chances are you will want to employ more powerful modeling constructs from languages like SKOS and OWL, but they all use RDF under the hood. RDF vocabularies can also be mixed and matched when and how you like. OWL can be used to equate terms and relationships, so once again we are not subjected to the Tyranny of the Common Form. It may seem like this would be chaos, but in practice it is quite organic, fluid, and manageable. RDF graphs can be serialized to a variety of formats, but the N3 notation is pretty reasonable. Dublin Core is a set of terms created by a bunch of librarians who wanted to standardize how online resources were described. You can reuse their work by describing your RESTful services with their terms. There is no need to create a new concept of “creator” or “title.” This code shows a series of three relationships (triples) expressed using the Dublin Core vocabulary: @prefix dc: <http://purl.org/dc/elements/1.1/> . <http://server/order> dc:creator <http://purl.org/people/briansletten> . <http://server/order> dc:title "Acme Company Order RESTful system" . <http://server/order> dc:created "2009-03-26T14:22Z" . Well-designed vocabularies are almost as easy to read as prose. You can imagine reading the statements left to right: This service was created by Brian Sletten.This service’s title is the Acme Company Order RESTful system.This service was created on March 26, 2009. The difference is that they are just as easy for software to read and interpret. Software doesn’t understand it as you do, but it is at least able to process, compare, and relate the terms based on the formalisms of the vocabulary and models. With the freedom to pick and choose vocabularies or create your own — and the ability to address RESTful services, data, business concepts, and so on — you should begin to see how this approach satisfies many of the goals of the Web service stack. You can have a fabric of content, described in whatever terms are useful by anyone in the organization. There is no need to spend huge amounts of time up front selecting terms. People will gravitate toward what is used and what makes sense to them, and there are ways to equate different terms. The process is reasonably lightweight and flexible in the face of inevitable change. To query RDF data sources, you would probably use the SPARQL query language. Even a casual introduction to SPARQL is beyond this article’s scope, but if you are familiar with SQL, it will not seem that strange to you. Assuming you had a metadata storage system (such as the Oracle Spatial/RDF Engine, Sesame, Mulgara, OpenLink Virtuoso or the Talis Platform), you could query for all resources created by a specific user: PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?service WHERE { ?service dc:creator <http://purl.org/people/briansletten> . } Or you could query for who created a specific resource: PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?creator WHERE { <http://server/order> dc:creator ?creator . } By mixing and matching different vocabularies and the ability to query fairly arbitrary graph patterns, you will be able both to describe and to find a wide range of resources. Linked data The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success. To underscore that organic development of distributed data and services is possible, you need look no further than the Linking Open Data project. Begun approximately a year ago, it has become the poster child of Web and Semantic Web architecture sexiness and feasibility. A small group of loosely affiliated professionals around the world has successfully described and linked billions of resources through billions of relationships, at minimal cost. This does not undervalue their efforts; they simply required nothing of the large, centralized data-model planning most organizations go through to deal with much less complicated models. Figure 2 shows a representation of the datasets involved as of March 2009. Each individual collection of data existed previously on the Web, often for quite some time. Like anything on the Web, you have very little knowledge about what is going on behind the scenes. You are simply able to ask for the content and subcontent by navigating RESTful links. What is different is that these silos of useful data are now connected to one another. The terms in one set are mapped into the terms of another via RDF and other Semantic Web technologies. This makes it possible to take facts about a resource in one set and mix it in with information from another. Clearly you have to have at least some trust in the sources as well as those making the connections, but the technology allows you to decide whom to trust and when. The bigger hurdle is allowing the fluid connectivity in the first place. Figure 2. The Linking Open Data project data sets as of March 2009 (Click to enlarge.) The data sets that have been integrated come from a wide set of domains, including metadata about music, Wikipedia entries, geographic locations, and census information,. The kind of dynamic systems that are possible with this web of linked data, accessible through RESTful services, is highlighted by the Flickr Wrapper Web service developed at the Freie Universität of Berlin. This site finds a reference to a concept that it resolves via DBPedia (metadata extracted automatically from Wikipedia) through a RESTful interface. It analyzes the content that is returned for alternate terms by which the resource might be known and geographic location references. This information is then used to parameterize a Flickr query for images tagged with those words and constrained to a particular geographic region. With precious little code, it becomes easy to find high-quality relevant pictures of Los Angeles (see Figure 3) or Yokohama, Japan. Figure 3. Pictures of Los Angeles extracted from Flickr based on DBPedia metadata (Click to enlarge.) You can also start to query Twitter or Crunch Base for information about people or companies and convert it to RDF on the fly for use in this model. Give it a try. Go query and explore the Linked Data cloud directly — try a text search for information about JavaWorld, for instance. Securing services As organizations move from linking public data to linking private data, security becomes a serious issue quickly. Not only does REST enable technical freedom and new business functionality; it also can make things more secure in the process. A frequent early reaction to REST is nervousness about leaving cookie-crumb trails to all of your sensitive information. In a world of hackers, do you really want to give them a path to follow to get to the data? Somehow, it seems much safer in a database, locked up and inaccessible to the world. The problem is that this same safety prevents the data from being as useful as it could be. The trick is to realize that with REST, giving something a name and asking for it are two separate activities. There are things in this world that you know exist, but that does not mean you have access to them. You may somehow figure out how to contact a celebrity, but that does not mean you will get to talk to her. You may know that there are codes to nuclear weapons arsenals, but (thankfully), you do not get to see them. The distinction between knowing about something’s existence and getting to it applies to RESTful URLs too. Consider what alternate strategies induce in your systems. If you want to orchestrate a handful of (non-RESTful) Web services, you must pass actual data between them. A first service might query a database and return some results. A second service might sort and filter the results. A third service initiates a new business process based on the data, and so on. The model usually followed is to lock down these services with some manner of authentication and authorization system (“Bob can call this service”). The data itself is often left unprotected as it is transferred from service to service. Sure, you can lock the transaction down to protected channels with SSL, but it sits in an unprotected state within each service. If those services have no need to access sensitive information, they should not be given it. Imagine if the data were medical results moving between a physician’s office, an insurance company, and your place of employment. Suddenly these patterns seem a lot more relevant. Cryptography fans will quickly insist that the point of standards like XML Encryption is to allow sensitive information to be shared through potentially untrusted intermediaries. This is true, but it’s exactly the kind of thinking that has made WS-*-based systems heavy, cumbersome, and expensive. First, this model introduces the problem of key management, a complex issue that is rarely given the thought needed to be successful and secure. Second, it locks you into a specific format, usually. Think instead of a model in which the results of queries are not data itself, but references to data through RESTful URLs. Now, only those steps in an orchestration that need access to the information will get it. A huge part of minimizing risk for internal fraud and the expense of external audits is to keep access to sensitive information very narrowly scoped. What you would like to be able to say is “Bob is allowed to call this service with this data in this context.” The military has always had this model (with concepts like “eyes only” and “need to know”), but the corporate world is still learning it. Barings Bank, one of the United Kingdom’s oldest investment banks, went under because Nick Leeson was able to make trades against markets that he should not have been given permission to use. Effective security is not just about granting access to services; it is about doing so within a context for particular data sets. RESTful APIs make this level of sophistication much easier by adopting global, logical, resolvable names to data. These are safer to pass around and easier to protect than data itself. URI curation Anyone who has been around the Web for a while is likely to protest that URLs are fragile. Links break. This is true, but deliberate choice of URLs based on logical names and relationships that will endure can go a long way toward solving this problem. By following some specific guidelines, you can create longer-lived URLs that will not break as easily. Information to avoid within URLs includes: Specific technologies (servlet, .php, etc.): Nobody cares what you used to produce it. If you change what you use, you break the links.Specific formats (if you can avoid them): Use MIME types and content-negotiation instead. Links are more flexible and reusable if the XML and HTML versions of a resource are the same instead of being differentiated by the .xml and .html suffixes.Frequently changing structures (such as organizational charts) : Your org chart will change, Cool URIs do not.Items in “draft” or “beta” status: By definition, something going through a review draft will come out of it. Use metadata to indicate this status, rather than encoding it in the resource definition. . Think about the logical names of things, and your URLs will be much more resilient to change. That being said, the criticism remains. Machine names change, data gets moved between servers, collections and data sources are merged. Unpredictable (but not unexpected) forces can make your URLs more brittle than they need to be. To deal with this, organizations like the Online Computer Library Center (OCLC) have advocated URL curation services for 15 years. It runs the site http://purl.org to introduce the notion of deliberate URL curation into the mix. PURL stands for Persistent URL. You can define URIs within domains that are resolvable to a PURL resolution server. There is a level of indirection to find where a resource is currently located. It is like the Domain Name Service (DNS) for terms and concepts. Figure 4 shows a concept PURL being referenced and redirected to its current location.. Figure 4. A Concept URL being referenced and redirected to its current location (Click to enlarge.) Should a resource’s location change, its PURL definition can be updated. Clients should remain unaffected. While the OCLC offers the PURL service for free, you may not want to have a dependency on an external site. Fortunately, the software is available (and was recently rewritten to use NetKernel as its engine!) for download, so you can run your own local versions. (See the article Resources for a download link.) In conclusion Resource-oriented architectures provide the flexibility of the Web, scalable implementations, cachable partial results, architectural migration strategies, and increased security profiles. The move to REST is not a move away from SOAP per se, but an embracing of the Web in its entirety. It is no longer acceptable or amusing that it is easier for people to find content on the Web than in their own organizations’ systems. It is no longer acceptable that data needs software to be written (over and over again) to convert from one form to another. Being able to register services that do this and find them dynamically should allow you to build highly flexible systems that find information in one form and find ways to convert it to alternate representations on demand. In an information-driven environment, entirely new business capabilities will emerge while the cost of data integration will plummet. New forms of business intelligence, data mining, data reuse, and collaborative knowledge sharing will emerge. REST is not the only valid form of integration: SOAP has its place for distributed transactions among asynchronously linked process partners. Messaging is also making a comeback with emerging trends like AMQP and XMPP. These are all valid architectural forms that deserve to be highlighted in their own right. But REST has a special place by virtue of its relationship with the Web. You will be well-served by knowing what it has to offer, and its limitations, as you look to build next-generation systems. Brian Sletten is a Senior Platform Engineer at Riot Games, an independent Los Angeles-based game developer and publisher. He has a background as a system architect, developer, mentor, and trainer, with experience in the online game, defense, finance, and commercial domains. Brian has a B.S. in computer science from the College of William and Mary and is in the process of moving to Los Angeles, California. JavaWeb DevelopmentSoftware DevelopmentDevopsDevelopment Tools