Integrate a native XML operational data store into your enterprise application With Web services now becoming a viable technology, enterprises are beginning to see real return on investment from this technology. Web services represent a less invasive, less costly, and loosely-coupled approach to integrating heterogeneous, distributed applications and processes. Web services and service-oriented architectures (SOAs) address the integration of business processes and applications; therefore, it becomes necessary to understand how Web services interact with the data layer.At the heart of any software architecture resides information, or data as we know it, that can span multiple locations, applications, and usage scenarios. If this data can be represented in a pervasive and standards-based manner, it becomes that much easier to enable the flow of this information throughout an enterprise and also across trading partners.We believe XML databases provide a schema-agnostic and ubiquitous representation of information and can enable enterprise information integration (EII). By the end of this article, you will have enough information to deduce that XML databases combined with Web services enable the flow of information across loosely-coupled applications, resulting in a more responsive architecture and compelling return on investment. Before we move on to specific examples, you will need to understand what a native XML database (NXD) is all about.What is a native XML database?Native XML databases are designed especially for storing XML documents. Like other databases, their basic supported features include transactions, security, multiuser access, programmatic APIs, query languages, and so on. NXDs differ from other databases in that their internal data models are based on XML only.NXDs are most useful for storing document-centric data because they preserve document order, processing instructions, comments, CDATA sections, and entity usage, while XML-enabled databases do not. Furthermore, NXDs support XML query languages, allowing you to pose requests like, “Get me all documents in which the third paragraph after the start of the section contains a bold word.” Such queries are clearly difficult to ask in a language like SQL. In addition, NXDs prove useful for storing documents with a natural format of XML, regardless of what those documents contain. For example, consider XML messages in an SOA environment. Although these documents are probably data-centric, their natural format as messages is XML. Thus, when they are stored in a message queue, it makes more sense to use a message queue built on a native XML database than a non-XML database. The NXD offers XML-specific capabilities, such as XML query languages, and usually retrieves whole messages faster. Another example of this usage is a Web or enterprise data cache.Uses for a native XML database include:Enabling EIIProviding a unified master data-access layer across the enterpriseValidating, persisting, querying, and repurposing XMLBecoming XML-standards compliantAggregating content from a variety of systems (Java Database Connectivity (JDBC), HTTP, filesystem, Web services)Serving as an enterprise data cache and operational datastore to improve data-access response times and relieve burden on backend systemsSupporting an enterprise data bus solutionNative XML database features may include: Internal identity management systems and integration with external identity management systems like LDAP (lightweight directory access protocol)Seamless, schema-independent persistence/caching of Java Web service messages, XML, SOAP, and WSDL (Web Services Description Language)Built-in support for security standards such as Security Assertion Markup Language (SAML), Web Services Security (WSS), and XML EncryptionBuilt-in support for workflow management based on Business Process Execution Language (BPEL)Built-in support for ebXML Registry functionalitySeamless persistence of unstructured contentRobust and lightning-fast querying by an engine with “pure-play” XQuery implementation, where, instead of mapping XQuery to another query language, it is used directly against a database designed from the ground up for XQueryInteractive and intuitive graphical environmentIntelligent tools to repurpose the data (using XSLT (Extensible Stylesheet Language Transformations) and XQuery)Seamless integration with external JDBC sources with ability to read, query, insert, update, and delete all within an XA-compliant transactionSeamless integration with HTTP and SOAP sourcesTransactions with all available datasources using XQueryStandards compliance by enforcing schema validation and data aggregation mapped to a required schemaA single source of identity management and authorizationBackup, restore, and replicate capabilitiesFigure 1 illustrates an NXD server’s components.Two primary uses of an XML database are to enable EII or provide a midtier operational datastore (ODS) platform.Enterprise information integrationXML databases enable EII by providing a platform for querying across heterogeneous datasources, resulting in one 360-degree view of all common entities spread across enterprise systems or services. EII provides huge benefits to business users. For example, imagine a doctor-patient encounter and a system where a doctor can enter a patient chart number, name, or other form of identification and obtain information on that patient’s history of illnesses, allergies, medications (current and past), X-rays, past surgeries, and doctor summary reports, all in one screen, irrespective of the originating datasources. Operational datastoreA midtier ODS can provide the necessary infrastructure for managing enterprise data and bringing it closer to the consuming business application, while simultaneously reducing the burden on backend systems of record. XML databases are an ideal technology to serve as an ODS because of their ability to maintain schemas and to bind heterogeneous datasources. Furthermore, XML databases’ support for XA-compliant transactions make them an ideal ODS and EII technology that enables both read and write capabilities across heterogeneous systems.With this information in mind, we move on to a specific use-case scenario and look at how the ODS ties these concepts together.ScenarioConsider the high-level architecture of a hospital information system (HIS). As is the case with any scalable, connected, and secure information system, an HIS consists of bringing information and functionality from distributed systems to diverse users in real time. Typically, the actors in this use case are doctors, nurses, lab technicians, IT resources, third-party vendors, and probably patients. An HIS architecture contains the following components:A Federation of datasources DatabasesRegistriesAn integration of systems Packaged applicationsCustom applicationsMainframesThe delivery of functionality PortalsMedical digital assistants (MDAs)Figure 2. The hospital information system’s high-level architecture. Click on thumbnail to view full-sized image.In Figure 2, the federated datasource could be an XML database that queries backend systems via their Web service interfaces. To better understand the overall system architecture, consider a typical hospital infrastructure model as depicted in Figure 3:The above diagram depicts an overall set of systems a hospital may be using. These systems could potentially be provided by a single vendor, but in most realistic cases, they are provided by disparate vendors, each with a totally different set of APIs and user interaction interfaces. Thus, for these systems to live together and exchange information, the hospital IT department should create a set of Web services for each system, exposing the important data and functionality of the respective system. Java Web services: What problems can it solve?Presuming the hospital IT department leverages Web services, many traditional problems can be addressed:Connecting traditionally separate and autonomous software systemsEnabling the construction of distributed systemsCreating dynamic, collaborative applicationsAllowing diverse and redundant systems to be addressed through a common, coherent set of interfacesProtecting existing IT investments without inhibiting the deployment of new capabilitiesBringing information technology investments more in line with business strategiesJava Web services need persistence and queryAs mentioned earlier, Java Web services create huge amounts of new data, specifically the exchange of data-rich XML messages. These messages contain important information that many organizations will want and need to store, access, query, audit, analyze, and repurpose.It is nearly impossible to persist all of these messages in a relational database because of the inflexible data model they impose. You must know what type of data the message will contain and set up relational tables to store it. Additionally, you will have to write code that knows, for every message type, how to take the incoming message, shred it, and populate the tables. XML databases are particularly useful for handling new message types or evolving message structures. Storing message content in a native XML database reduces the development time and cost at least 50 percent by eliminating the need to define object-to-relational mapping. Extracting, transforming, and working with XML content stored in a native XML database is also relatively simple. Every aspect of data management is done using XQuery, which is a powerful language specified by the W3C (World Wide Web Consortium) and designed specifically for working with XML. Furthermore, being able to seamlessly communicate with internally managed data, HTTP, filesystem (URI), and Web services (based on WSDL references), all from within an XQuery statement makes for a powerful and easy-to-use solution. Such functionality enables an enterprise design to have a single point of access (NXD) and provides the ability to aggregate, transform, and repurpose the data via the same API and query language (XQuery).An NXD can also serve as an operational data cache. Using this approach, specific content that most likely will not change often, or once created, never changes, can be cached in the NXD as either XML or other formats required by the client systems. In addition, each datasource can be configured with a time-to-live setting; when a request is made, that configuration is evaluated by the NXD engine and results are either returned directly from the internal cache or fetched from the originating source (if the cache is deemed as expired).As we dive a little deeper and discuss specific use-case scenarios as depicted in Figure 4, we will build a stronger case for how XML databases can augment Web services. Figure 4 depicts four specific example use cases and also introduces three external sources a hospital must interact with on a day-to-day basis: Food and Drug Administration (FDA), Centers for Disease Control and Prevention (CDC), and insurance providers. In all three cases, the hospital must communicate using Web services and XML-based standards as follows:CDC and FDA: A CDA (clinical documents architecture) standard from Health Level Seven (HL7 is an ANSI-accredited standards-developing organization).Insurance providers: An XML-based standard as defined by the ACORD (Association for Cooperative Operations Research and Development) standards body. The ACORD XML for P&C (property and casualty) standards addresses the industry’s real-time requirements. It defines P&C transactions that include both request and response messages for accounting, claims, personal lines, commercial lines, specialty lines, and surety transactions.The complexity of the data and a typical WSDL describing each datasource contains numerous specific methods, each returning a part of the overall content. To make all content available in a single view, the client application must make multiple Web service calls to each datasource. Returned data must be aggregated in such a way where it can be properly interpreted and used by the client.Senario detailsDoctor-patient encounter A doctor has a computer in the patient’s visiting room. During a patient encounter, the doctor launches a browser and logs on to the hospital’s internal portal. Based on the user’s role in the system (in this case, a doctor), the user is presented with a certain list of functions available to him, such as the Patient Record Viewer. A Patient Viewer screen consists of the following tabs:Patient Details: Primarily patient information such as name, address, and birthdayPatient Medical History: A list of all known illnesses or patient encounters, with the ability to pull up details on each illness (and order new lab work) including:Doctor reports and notesX-ray and other types of imagingMedicationsPatient Medication: A list of all known medications this patient has been prescribed over time, with the ability to prescribe new medication if neededPatient Allergies: A list of all known patient allergies, with the ability to make a new entryMedical Report: A list of available doctor observations and visit reports, with the ability to create a new report following a given visitTo provide a single view of a patient, the system must communicate and exchange information with the following hospital systems: reporting and decision support, patient care management, imaging, ambulatory, laboratory, health information management, pharmacy, and possibly the FDA and CDC depending on the patient’s diagnosis.A doctor may be using a PDA or some other wireless device during a patient visit or hospital round to obtain similar information. In this case, the system must be able to detect the device type and provide it with a response/content applicable to that device. This would require content transformation, such as: XML to HTML, SVG (scalable vector graphics), CSV (comma-separated values), or other. To aggregate and present the above content (using a typical Java API for XML-based Remote Procedure Call (JAX-RPC) interface), the client application must make at least 10 to 25 Web service calls. In addition, more code is required for converting returned content to presentation and to make updates if needed. The following example returns just the patient’s name, contact, and other primary information: Service service = ServiceFactory.newInstance().createService(null); Call call = service.createCall(); call.setTargetEndpointAddress(opts.getURL()); call.setOperationName(new QName("urn:cominfo", "getPatientInfo")); call.addParameter("sessionID", XMLType.XSD_STRING, ParameterMode.IN); call.addParameter("chartNum", XMLType.XSD_STRING, ParameterMode.IN); call.setReturnType(XMLType.XSD_STRING); if(opts.getUser()!=null) call.setProperty(Call.SESSION_PROPERTY, opts.getSession()); if(opts.getPassword()!=null) call.setProperty(Call.CHARTNUM_PROPERTY, opts.getChartNum()); String res = (String) call.invoke(new Object[] {args[0], args[1]}); Multiples calls are required and results may need to be aggregated into a single XML message, then parsed or transformed as needed.Billing A financial services user, either via a browser or a desktop application, must obtain a list of all rendered services to a given patient for a certain period of time. This, once again, requires access to most available systems of record within the hospital infrastructure and another 10 to 15 separate Web service calls.Using returned/aggregated information, a financial services user selects which services require billing at this time. Once selected, the system needs to prepare an invoice and electronically submit it to the appropriate insurance provider using the ACORD standard over a Web service provided by that insurance carrier. This requires code to convert performed services content to the ACORD-based XML standard, then another Web service call to send the invoice.With an NXD in place, a single XQuery statement referencing all WSDL-based sources can aggregate the content and transform it into an ACORD-compliant XML document using either a referenced XSLT (transformation) or XQuery alone. Ambulatory services Ambulatory services personnel receive an emergency call about a patient having a possible heart attack. Armed with a wireless device, they arrive at a patient’s residence and perform required tests to diagnose the problem. They conclude that patient must be admitted to the hospital as soon as possible and requires an immediate injection of penicillin.Using a wireless device, the ambulatory services personnel log in to the hospital’s portal and request information on the patient’s history of illnesses, allergies, and past and current medications to determine if there are any issues with the medication about to be administered. This information must be accurate and immediately available.With an NXD in place, once again, a single XQuery statement is sent. The server evaluates cache settings of every source involved and aggregates required content from both the locally persisted cache as well as the originating content provider. Results are transformed for the client device requesting the content and sent back. Emergency servicesA patient arrives in a hospital’s emergency department with acute respiratory difficulty. The attending doctor suspects botulism. This physician usually calls a toll-free number to get a duty officer. Instead, she sends an HL7 standards-based CDA document (describing a botulism case report) to the Department of Public Health (via a Web service). It is sent directly to the duty officer. When it is received by the botulism system, the duty officer reviews it, and a message is sent to the quarantine station (regional CDC) to release the antitoxins to the appropriate hospital pharmacy.The hospital’s laboratory receives a request for a test to confirm the botulism. It sends notification to the local health authority that such a test is requested. A sample is drawn at the hospital and shipped to the state lab for testing. Usually, the antitoxins are released based on the reporting systems and any other supporting tests performed (EMG, neurological studies).The CDA botulism case report is sent to the Department of Public Health with initial ED (emergency department) findings. When the lab results for the test confirm the case, the toxin assay results are sent to the Department of Public Health, which uses that data plus the CDA to create the Nationally Notifiable Diseases (NNDSS) message to be sent to the CDC.Using an NXD, all messages to the CDC and FDA are created by the server using a predefined transformation or a server-stored XQuery. An XQuery may also contain appropriate logic for routing messages based on the events, type of data, etc.RPC vs. XQueryA typical RPC call may resemble the following: Service service = ServiceFactory.newInstance().createService(null); Call call = service.createCall(); call.setTargetEndpointAddress(opts.getURL()); call.setOperationName(new QName("urn:cominfo", "getPatientInfo")); call.addParameter("sessionID", XMLType.XSD_STRING, ParameterMode.IN); call.addParameter("chartNum", XMLType.XSD_STRING, ParameterMode.IN); call.setReturnType(XMLType.XSD_STRING); if(opts.getUser()!=null) call.setProperty(Call.SESSION_PROPERTY, opts.getSession()); if(opts.getPassword()!=null) call.setProperty(Call.CHARTNUM_PROPERTY, opts.getChartNum()); String res = (String) call.invoke(new Object[] {args[0], args[1]}); The code above shows only the actual call and response code. Using this approach, pages and pages of code must be written to properly aggregate and transform the content.Using XQuery, however, the code, which could tap all available example sources, is reduced to: <root> { import service namespace PatientReg = 'PatientReg.wsdl' name PR; import service namespace Reporting = 'Reporting.wsdl' name REP; import service namespace Pharmacy = 'Pharmacy.wsdl' name RX; import service namespace HealthInfoMngmnt = 'HIM.wsdl' name HIM; import service namespace Laboratory = 'Lab.wsdl' name LAB; import service namespace ITforPhys = 'ITforPhys.wsdl' name ITP; import service namespace Ambulatory = 'Ambulatory.wsdl' name AMB; import service namespace ITMngmnt = 'ITMngmnt.wsdl' name ITM; import service namespace GeneralFinance = 'GeneralFinance.wsdl' name GF; import service namespace Imaging = 'Imaging.wsdl' name IMG; import service namespace PatientCareMngmnt = 'PatientCareMngmnt.wsdl' name PCM; for $PCM in collection(PCM:GetPatientDetail(UserID, chartNum)), $RX in collection(RX:GetPatientMeds(UserID, chartNum)), $HIM in collection(HIM:GetPatientProcedures(UserID, chartNum)), $LAB in collection(LAB:GetPatientLabWork(UserID, chartNum)), $AMB in collection(AMB:GetPatientAmbServices(UserID, chartNum)), $IMG in collection(IMG:GetPatientXrays(UserID, chartNum)) return <PatientCareMngmnt> { $PCM } </PatientCareMngmnt> <Pharmacy> { $RX } </Pharmacy> <HealthInfoMngmnt> { $HIM } </HealthInfoMngmnt> <Laboratory> { $LAB } </Laboratory> <Ambulatory> { $AMB } </Ambulatory> <Imaging> { $IMG } </Imaging> } </root> To reference a transformation, either an XSL transformation can be used or XQuery alone. An example of an XSLT reference in XQuery is as follows: declare outbound-transformation 'tig:///HL7DEMO/XSLT/sectional.xsl'; <root> { for $b in collection('tig:///HL7/CDAR2') let $documenturi := document-uri( $b ) where $b/levelone[contains(clinical_document_header/patient/person/person_name/nm/GIV/@V,'a') and contains(clinical_document_header/patient/person/person_name/nm/FAM/@V,'a')] order by $b/levelone/clinical_document_header/patient/person/person_name/nm/GIV/@V return for $cd in $b/levelone/clinical_document_header return <clinical_document_header> { $cd/*:*, attribute docNameRef { $documenturi }, $cd/*:* } </clinical_document_header> } </root> Using XQuery is simple and powerful. The above examples are just the tip of the iceberg as far as XQuery’s capabilities are concerned.ConclusionWeb services are not without their limitations as a technology. Combining Web services with native XML databases and XQuery can extend the power of Web services inter- and intra-enterprise. The result is a powerful computing platform for the enterprise, enabling integration of applications, processes, and data in an efficient, standards-based, and low-cost manner.Robert Smik is a senior systems architect at Raining Data Corporation. He has been involved in application and software design and development for more than 15 years. His experience includes design and development of highly complex database systems, architecting multitier Web environments, architecting and developing various connectivity solutions, products, and smart cards, in addition to SOA and data aggregation tools. Smik is an active member of HL7 and CDISC. Ash Parikh is the director of technology and development of the Enterprise Applications Group at Raining Data Corporation. He is a named expert in the field of distributed computing and has presented and authored abstracts for Delphi BPX Summit 2004, Delphi Enterprise On-Demand 2004, JavaOne 2004, JavaOne 2003, BEA e-World 2002, and JavaOne 2002. Parikh has more than 15 years of IT experience and is an active member on a number of JSRs in the Java Community Process and in the Technical Committees of OASIS. He is also the president of the Bay Area Chapter of the Worldwide Institute of Software Architects. Parikh is the collaborating author of Oracle9iAS Building J2EE Applications (Osborne Press, November 2002), and has also authored several technical articles in journals such as XML-Journal, Java Pro, Web Services Journal, ADTmag, Softwaremag.com, and Java Skyline. Ajay Ramachandran is vice president and general manager of Raining Data’s Enterprise Applications Group. His expertise includes management consulting, information technology, biotechnology, product management, sales, and marketing. His career includes cofounding and executive-level positions at several Internet services and software companies. Ramachandran has degrees in molecular cellular biology (genetic engineering) and organizational communications. JavaSoftware DevelopmentWeb DevelopmentProgramming Languages