The nature of SOA data requires a native XML data management server The basic tenet of a service-oriented architecture (SOA) is to provide loose-coupling for different applications. It is thus imperative that data is produced by and for these applications, and that this data is stored and handled optimally. Given the pervasive nature of each application in an SOA, the way this data is stored is typically location-dependent and specific to the application.An SOA repository is a mechanism that handles the persistence of distributed SOA data. It is a complex and sophisticated enterprise-grade technology that not only handles persistence and caching, but also enables lifecycle management, security, discovery, and transformation of distributed data from diverse service-oriented applications such as silo applications, Web portals, business processes, and mobile applications.SOA data is basically transient and streaming in nature. It thus necessitates a native XML data storage that aggregates the data relevant to a specific service, regardless of the applications used, rather than assigning the data to the individual applications that make up that service. Otherwise, data becomes difficult to access and cost-prohibitive to store and replicate. SOA data is typically stored in relational databases and filesystems, but these are not entirely capable of handling SOA data. Elliotte Harold, in his article “Managing XML Data: Native XML Databases,” (IBM developerWorks, June 2005) clearly addresses the need for and benefits of a native XML database. In his words, “When your only tool is a hammer, everything looks like a nail. When your only tool is a relational database, everything looks like a table. Reality, however, is more complicated than that. Data often isn’t tabular and can benefit from a tool that more closely fits its natural structure. When that data is XML, the appropriate tool for managing it might well be a native XML database.”Being fundamentally XML, SOA data cannot be easily modeled in relational databases. The inflexibility of relational database schemas does not lend itself well to the ever-evolving nature of schemas in an SOA, and more so when trading partners collaborate across enterprises. Filesystems also do not provide advanced querying and management capabilities, which is a typical need in an SOA. For these compelling reasons, we strongly believe that data created as XML should be persisted, managed, and treated as XML.Consider the complex and ever-evolving list of Web services standards. They include a number of OASIS initiatives such as Web Services Business Process Execution Language (WSBPEL), Web Services Security, Web Services Distributed Management (WSDM), ebXML Collaboration Protocol Profile and Agreement (CPPA), and Web Services Policy Framework (WS-Policy), as well as numerous World Wide Web Consortium initiatives, and REST-based XML artifacts. Wading through this exhaustive alphabet soup of standards, one realizes that at their core, these standards are basically represented by XML Schemas such as the WS-Policy XML Schema, the Collaboration Protocol Profile (CPP) XML Schema, the Collaboration Protocol Agreement (CPA) XML Schema, further strengthening the case that if SOA data is created in XML, it should be persisted, managed, and treated as XML. Consider Figure 1’s WS-Policy Schema ws-policy.xsdfile associated with the WS-Policy Framework initiative, which standardizes how policies are to be communicated between service consumers and providers.As shown in Figure 1, data and metadata associated with WS-Policy can be represented in XML and, hence, stored and managed with ease in an XML persistence mechanism. Similarly, CPP, CPA, and Web Services Security details can also be natively stored and managed in an XML persistence mechanism.Mid-tier caching in an SOASOAs need persistence mechanisms to persist information such as the state of a business step in an application, the state of a long-running business process in execution, Web services management, monitoring information, lists of available Web services, and more. Often, much of this information is frequently requested and accessed, thus making the case for caching in the middle tier, which also alleviates the performance bottleneck that can be caused by multiple requests to the same information store. With SOA data and metadata being XML, we propose a simple, yet effective, mid-tier caching architecture that includes an XML database as a mid-tier cache along with a number of XQuery-powered services. An SOA repository can enable increased performance, reliability, functionality, and usability of SOA artifacts through an effective mid-tier caching architecture powered by a number of important services as follows:Policy-based caching service: For increased performance and quality of service (QoS)A policy-based caching service can enable the setup of XQuery-based policies to cache result sets of low-performing services. These policies can also be constructed to include the time-to-live before the cache is refreshed. Policies based on time-of-day requests can determine if the data in the cache is valid for this request or if the originating source must be used. Also, policies based on service availability ensure that if the service is not available, results are obtained from the cache. A cache can be refreshed based on time and other configurable parameters by letting policies trigger the XML persistence mechanism. The design can also include dynamic just-in-time trace logging for service calls made by the XML persistence mechanism.Data repurposing service: For richer functionality and improved performanceA data repurposing service can enable additional filtering and search criteria on content returned from a given service. Additionally, XQuery can be used to drive transformations for repurposing the content and provide analytics and reporting on returned content. XQuery can also deliver portions of result sets and create a final result set based on aggregation of content from multiple services. Data abstraction service: For easier deployment and maintenanceA data abstraction service can eliminate the need for Web services to be aware of individual datasources. Figure 2 shows a better use of Web services by eliminating the need to develop separate clients and Web services for each operation. Datasource management for disparate datasources such as JDBC, HTTP, WSDL, and filesystems can be enabled using this service.In addition, since services can run on any system, an SOA repository can be used to enable the federation of services in an SOA. It can also be used to alleviate performance issues for center-tier process-abstracting remote services by collocating the data as close to the data processing as possible. As a persistence layer at the central-tier, an SOA repository can be used to store transactional information for many purposes, including analysis and integrity management issues, such as logging. By handling abstracted and composite data elements at the central-tier, a centralized repository for SOA data can be enabled.Exciting new SOA technologies such as enterprise service buses and orchestration engines can employ an SOA repository for state management, workflow persistence, and message persistence. An SOA repository can also provide the persistence backbone in SOA registries, whether they are UDDI (Universal, Description, Discovery, and Integration) or ebXML registries, to enable the discovery, publishing, and subscription of services. The need for complex and sophisticated XML data management for an SOAAs already discussed, Web services and SOAs create huge amounts of complex and sophisticated new data in the form of data-rich XML messages exchanged between applications, which must be stored so they can be effectively audited and analyzed. When we look at the various technologies that enable and empower an SOA, it is apparent that an SOA’s key characteristics and benefits form the basis for many vendor offerings in this space. As shown in Figure 3, the following functionalities form the core SOA and Web services infrastructure:Web services managementWeb services monitoringSOA governanceWeb services securitySOA persistence and cachingSOA discovery, publishing, and subscriptionWe can map the use of XML data management as an enabling and/or empowering technology at various points in this infrastructure in the following ways:SOA metadata persistenceSOA discovery, publishing, and subscriptionPersistence of Web services management dataAcceleration cachingService aggregationWeb services policy caching and managementXML data management in an SOA can also enable and/or empower: Persistence of monitoring, logging/auditingPersistence of security capabilitiesSOA governanceSOA OLAP data and metadata transformations and persistenceTrading partner profile and agreement persistenceMessage persistenceState managementSchema versioningNative XML data management server overviewAn XML data management server (XDMS) is much more than a data store for XML data. An XDMS is a sophisticated system that must be designed with flexibility, scalability, and performance in mind. The reality is that most XML data management servers do not measure up to these exacting demands. Typically with an XDMS, no prior knowledge of the XML document?s structure is necessary. Any valid XML document such as XML, Web Services Description Language (WSDL), CPPA, XML Schema, or Extensible Stylesheet Language Transformation, can be inserted at will, and the native XML data management server automatically will create the required internal structures to accommodate such storage.In addition, XML data management servers support transactions, indexing, schema or DTD validation (some support schema versioning), extended connectivity, users- and groups-based security, plus backup/restore and server mirroring. An XDMS solution must also be able to store non-XML data (such as binary data), thus providing a solution for storing any other content you may require.The native XML interface for SOA repository operations is XQuery. To tap into the full potential of XML databases, XQuery is the way to create, manipulate, examine, and manage XML data. XQuery also provides a standard way to unify disparate datasources and make them all appear to be a single server. Introducing XQueryXQuery is a functional language; as such, expressions are composed and combined to create arbitrarily complex queries over one or more sets of XML data. XQuery offers both strongly-typed mechanisms using XML Schema and DTD, and weakly-typed mechanisms for handling raw XML data.The XQuery data modelThe XQuery data model is more extensive than the standard XML data model of XML Infoset and Post-Schema Validation Infoset (PSVI). XQuery is defined in terms of operations on the data model, but it does restrict how documents and instances in the data model are constructed. The data model consists of the XML data being queried, any intermediate values, and the final query results. It supports intermediate expressions that can result in values that are not XML (for example, a list of integers or strings), XML fragments, and both typed and untyped data.XQuery and XML Schema have the same type concept for XML data. XQuery provides built-in types based on XML Schema and support for user-defined Schema types. XQuery also supports additional data types outside the existing XML Schema data types. The components of the XQuery data model and type system are as follows:Items and sequencesItems are a single node or atomic value (singleton), which is equivalent to a sequence of length one. A sequence is made up of a series of items. Sequences can be empty, but cannot contain other sequences. Every value in the data model is a sequence of zero or more items.Items in an XQuery data model can be:Typed: Receive their type annotations from XML Schema or DTD documents.Untyped: Employ untyped semantics to coerce string values into their desired typed values.Atomic valuesAtomic values are singletons with an atomic type derived from the XQuery type xdt:anyAtomicType.NodesAs in XML, XQuery has seven types of nodes. Each node has a unique identity and an inherent ordering in the document (document order).Types of nodesTypes of nodes that build an XML tree in the data model are:DocumentRepresents the entire XML document, including basic information (base URI, children, unparsed entities, document URI).ElementRepresents elements within a document, including basic information (base URI, node name, parent, type, children, attributes, namespaces).AttributeRepresents attributes within a document, including basic information (node name, string value, parent, and type).TextRepresents XML character content within a document, including basic information (content, parent).NamespaceRepresents namespaces within a document, including basic information (prefix, URI, parent). Namespace nodes are used to map namespace prefixes to URIs.Processing instructionRepresents processing instructions within a document, including basic information (target, content, base URI, parent). Contains instructions for applications in documents and starts with a target to identify the application where the instruction is directed.CommentRepresents comments within a document, including basic information (content, parent). XQuery expressions have a static type and a dynamic type. The static type pertains to the expression and is applied at compile-time. The dynamic type pertains to the value that results from the expression and is applied at runtime.Static typing versus dynamic typing: Static: Set of type-reference rules that match the query to the document.Dynamic: Set of value-reference rules that govern how the query is processed.XQuery syntaxSeveral types of XQuery expressions can be used in a syntax query:Primary expressionsBasic primitives, which include literals, variables, function calls, and the parenthesized expressions.Path expressionsUsed to pattern-match arbitrary nodes according to their name and type by navigating through the hierarchical structure and locating nodes.Direct and computed constructorsUsed to create nodes and provide structure for the XQuery result. These expressions are capable of composing arbitrary results into new XML documents.FLWOR expressionsThe clauses for and let bind variables to values and use these values to evaluate items associated with expressions. For expressions are recursive and can be nested; let expressions are bound to intermediate results and not recursive.Whereclauses contain one or more predicates used to filter through a set of values and limit the values to only those that meet the required criteria.Order-byclauses sort values in a result stream.Returnclauses use the values to build the results.Functions and operatorsIncludes:Arithmetic operatorsComparison operatorsNode sequence operatorsLogical operatorsBuilt-in functions (accessor, numeric, string, Boolean, date, time, duration, anyURi, QName, node, sequence, aggregate, context)User-defined functionsConditional expressionsIf-then-else statements used with Boolean conditions.Logical expressionsThe expressions and and or.Expressions on SequenceTypeDescribe an XQuery value when referring to a type in an XQuery expression. Furthermore, XQuery implementations can be extended. More sophisticated implementations include support for such datasources as filesystems, HTTP, Web services, Java Database Connectivity (JDBC), Java Message Service, and more.In Figure 4, XQuery provides the glue for a native XML data management server used as a flexible and standards-based persistence mechanism. We recommend the use of a native XML data management server, as illustrated in Figure 4, as a best practice approach for persistence within a SOA. A native XML data management server can be used to enable the federation of services in an SOA, as services are location independent by nature. David S. Linthicum in a blog titled “The Importance of Persistence within a SOA,” has called out federation of services, performance issues, storage, and management of transactional data and centralized metadata as key aspects of information-oriented integration using services. Federating services can also alleviate performance issues by enabling the collocation of data closer to the composites, where the actual data processing occurs. Having a native XML data management server at the central tier also allows architects to store transactional information for analysis and logging.SOA powered by XQuery examplesThe power of XQuery in an SOA can be realized by many sophisticated queries, examples of which follow below. These examples demonstrate how a specific implementation of XQuery can be coupled with a specific native XML data management server as the technology for seamlessly interacting with SOA artifacts.Example 1. Use XQuery to insert operations into a WSDL ... declare namespace wsdl = "http://schemas.xmlsoap.org/wsdl/"; insert <wsdl:operation name="testInsertOperation" parameterOrder="id"> <wsdl:input message="impl:testInsertOperationRequest" name="testInsertOperationRequest"/> <wsdl:output message="impl:testInsertOperationResponse" name="testInsertOperationResponse"/> </wsdl:operation> after doc("xxx:///SOARepository/webservices/PurchaseOrderWS.xml") /wsdl:definitions/wsdl:portType/wsdl:operation[@name="getPODetail"] ... Example 2. Use XQuery to check the availability of a service and update the status in the registry ... declare namespace wsdl = "http://schemas.xmlsoap.org/wsdl/"; declare namespace wsdlsoap="http://schemas.xmlsoap.org/wsdl/soap/"; for $i in collection('xxx:///SOARepository/webservices')/wsdl:definitions where fn:doc-available (fn:concat($i/wsdl:service/wsdl:port/wsdlsoap:address/@location,'?wsdl' )) return <Service> <TimeStamp>fn:current-date()</TimeStamp> <Status>down</Status> </Service> ... Example 3. Use XQuery to measure the reliability of a SOAP message ... declare namespace wsdl = "http://schemas.xmlsoap.org/wsdl/"; declare namespace? msg ="http://schemas.xyz.com/msg"; declare namespace SOAP="http://schemas.xmlsoap.org/soap/envelope/"; <ExpiredMessage> { for $i in doc('file:///c:/SOARepository/ReliableMessaging/RM14.xml')/SOAP:Envelope where $i/SOAP:Header/ReliableMessage/TimeToLive < current-dateTime() return $i } </ExpiredMessage> ... Example 4. Use XQuery to retrieve all TokenTypes used in Policy document ... declare namespace wsp="http://schemas.xmlsoap.org/ws/2004/09/policy/"; declare namespace wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"; <Tokens> { for $policy in collection('xxx:///SOARepository/WSPolicy')/wsp:Policy return $policy/wsp:ExactlyOne/wsp:All/wsse:SecurityToken/wsse:TokenType } </Tokens > ... ConclusionIn this article, we have presented a native XML data management server as the most logical approach for storing SOA data, as this data is basically XML. Some may argue that such data and metadata can just as well be stored and managed in a relational database management system. However, data and metadata born as XML must then be transformed into a relational representation, leading to a sizeable overhead in mapping and management. As the amount of SOA data and metadata increases in an enterprise and across trading partner boundaries, this complexity becomes even more an issue.Furthermore, since this data is frequently accessed and consumed, a native XML data management server powered by XQuery can be used to provide a standards-based mid-tier cache to reduce performance overhead and increase scalability and reliability in an SOA.We would like to thank Miko Matsumura, vice president of marketing at Infravio, former Java Evangelist at Sun Microsystems, and co-creator of SOA Blueprints; Sacha Schlegel, an expert on ebXML and software engineer with Cyclone Commerce; and Frank Cohen, director of solutions engineering with Raining Data Corporation for technically reviewing this article.Ash Parikhis the director of technology and development for the Enterprise Applications Group at Raining Data Corporation. He is a named expert in the field of SOA and distributed computing and has presented and authored abstracts for OASIS Symposium 2005, Delphi BPX Summit 2004, Delphi Enterprise On-Demand 2004, JavaOne 2004, JavaOne 2003, BEA e-World 2002, and JavaOne 2002. Parikh has more than 15 years of IT experience and is an active member on a number of Java Specification Requests in the Java Community Process and in OASIS technical committees. He is also the president of the Bay Area Chapter of the Worldwide Institute of Software Architects. Parikh is the collaborating author of Oracle9iAS Building J2EE Applications (Osborne Press, November 2002), and has also authored several technical articles in journals such as JavaWorld, XML-Journal, Java Pro, Web Services Journal, ADTmag, Softwaremag.com,and Java Skyline. Robert Smikis a lead architect/team lead with the Enterprise Applications Group at Raining Data Corporation. He has been involved in application and software design and development for more than 15 years. His experience includes design and development of highly complex database systems, architecting multitier Web environments, and architecting and developing various connectivity solutions, products, and smart cards, in addition to SOA and data aggregation tools. Smik is an active member of HL7 and CDISC. He has also co-authored articles in XML-Journal and JavaWorld. Premal Parikhis a lead architect/team lead with the Enterprise Applications Group at Raining Data Corporation. He has more than 10 years of experience in the software industry, which includes design and architecting products, along with prototyping, analysis, project modeling, and development of portals for the B2B marketplace. Parikh is also an active member on a number of OASIS Web services standards technical committees. JavaSoftware DevelopmentWeb DevelopmentProgramming Languages