by Jedidiah Yueh

How database virtualization works

analysis
Jul 11, 20139 mins

InfoWorld's New Tech Forum offers technologists a venue to explain how their innovative tech works in detail. In this first post, Delphix CEO Jed Yueh dives into database virtualization

Welcome to InfoWorld’s New Tech Forum. We are using this space to give voice to technologists and thought leaders who are advancing the state of the art in business technology, either in developing core intellectual property for a startup or in building new offerings for an established company challenging the status quo.

New Tech Forum will put the technologies themselves at the forefront. We are asking a select group of people to tell us in their own words why they are developing specific technologies and how those technologies work, at a level of detail that will deliver high value to technically literate readers.

As the name suggests, New Tech Forum is meant to be interactive. We encourage readers to ask questions and offer opinions in the comments section or to email us with more substantial responses, which we will consider for publication.

Our inaugural post comes to us from Jed Yueh, president and CEO of Delphix, who describes in detail the nature and inner workings of his company’s unique database virtualization technology. — Paul Venezia

Delphix’s database virtualization technology is designed to accelerate enterprise application projects for businesses by providing fast, flexible access to virtual data.

Enterprise applications constantly evolve to meet changing business demands, triggering expensive projects that overrun budgets and fall behind schedule. Data is the lifeblood of applications and must be pumped across project environments.

To make data management harder, applications are constantly on the move, migrating to new data centers, private clouds, public clouds, hybrid clouds, flash storage, and open platforms, creating complexity and fragmentation. With data growing constantly inside applications, managing data through project lifecycles gets harder and harder every day.

Today’s businesses deploy small armies of DBAs, storage admins, backup admins, and systems admins to create and move database copies across redundant hardware environments — dispiriting, rock-breaking work. IT teams struggle to deliver data using storage, backup, and replication products designed without application projects in mind, yielding an operational Frankenstein that leaves development teams waiting for data environments and drains businesses of millions of dollars each year in lost productivity.

The Delphix solution

Delphix’s Agile Data Platform is an integrated set of technologies that can bridge the data divide, enabling organizations to deliver the right data to the right team at the right time.

Delphix nondisruptively collects data from enterprise applications, automatically versions all changes, and delivers fast, flexible access to virtual datasets for application development, ERP implementations, database upgrades, data center migration, application consolidation, BI and data warehouses, data protection, and a range of innovation projects.

Delphix virtualizes databases by sharing data blocks across all environments instead of making and moving full, physical copies. On average, Delphix consolidates 20 physical database copies into the space of a single copy and reduces the cost of the next incremental copy by more than 100 times, changing the economics of application projects.

Businesses today have to trade off quality, speed, and cost. Delphix is designed to both shrink the cost of data access and improve quality and speed, eliminating that trade-off.

By versioning changes to data, Delphix provides project teams with self-service data control — fast, simple data refresh, rollback, integration, and branching. These features can dramatically reduce the time required to complete application development projects.

How database virtualization works Delphix connects nondisruptively to databases — the ubiquitous repositories for enterprise data — and loads a compressed copy of the data into the Delphix Engine, shrinking that data by three times on average. Inside the engine, the Delphix file system (DxFS) compresses data blocks within database files and filters out empty or temporary blocks, minimizing the data footprint.

After the initial data seeding, Delphix maintains synchronization by collecting changes and tracking all versions for as long as required (weeks or months). From any point in time, Delphix can open one or more virtual databases (VDBs) that can be used for development and other lifecycle environments.

For an average application, businesses maintain more than seven lifecycle environments for development, testing, QA, integration, training, pilots, operational reporting, production support, user acceptance, system validation, and sandboxes — not to mention redundant systems for backup, DR, and archiving logs. Instead of making and moving data copies over and over again, DxFS provides a virtualized view of databases by sharing the underlying data blocks across all environments and storing changes as new, unique blocks.

VDBs look and perform like normal, physical copies (users can add/drop tables, make schema changes, and run reports against the data), but include powerful features designed to accelerate application projects, such as virtual branches and fast data rollback or refresh.

Virtualizing databases fundamentally changes application testing and quality. In order to minimize cost and complexity, many organization test their applications using stale data that may be days or months old or use data subsets (nonrepresentative datasets) that can fail to test against a range of potential errors. With fast, automatic refresh and virtual databases that provide full, representative datasets, Delphix can dramatically improve the fidelity of QA and test environments.

Data version control Developers have long used source code version control to track changes and work in parallel streams. Application projects that run on databases must pair release versions with corresponding datasets — databases with the correct schemas and tables to enable application software to function properly. Databases, however, have traditionally been complex, slow, and hard to set up and maintain, which often forces application teams to settle for stale, partial, or shared data environments.

The Delphix Engine includes a second key technology component: the DataVisor, which provides efficient data synchronization (even across the WAN), full transactional consistency, integrated log shipping, and continuous versioning. With the DataVisor, a Delphix Engine can maintain synchronization with multiple source applications in near real time and automatically record and version all changes.

With a simple time slider, Delphix can quickly deliver a virtual version of a database at any point in time, down to the second or a specific transaction boundary. Instead of waiting weeks for data delivery, then running a test in QA, Delphix can reduce overall cycle times from weeks to hours, enabling faster testing and error detection.

Continuous versioning solves additional key challenges for enterprise applications. Many applications, like SAP, require data federation across multiple databases for data consistency; with the DataVisor, Delphix can deliver multiple databases at the same point in time in just a few clicks and minutes.

Fine-grained version control allows developers to reset a database to perform multiple comparative tests (A/B tests), create a library of retained versions that coincide with software releases, and quickly roll back to a previous stage during complex data conversion and mapping cycles.

Data changes constantly in databases, making it impractical to know ahead of time when a specific version will be required. With the DataVisor, Delphix automatically records all versions. If a DBA drops a table, it can be recovered in minutes from the most recent version available, minimizing downtime, data loss, and productivity or revenue impact.

Performance and load DxFS not only virtualizes databases by sharing data blocks in storage, it also shares data blocks in memory, enabling highly cost-effective performance for consolidated application workloads. On average, customers run 20 VDBs per Delphix Engine but can run up to 400 VDBs in some environments. Elasticity allows application teams to expand to environments as needed during project lifecycles, providing flexibility to adjust to changes in requirements.

For production or source applications, Delphix can actually reduce loads while maintaining synchronization. Delphix connects to source databases through standard APIs, so it does not sit between the production data path or require installation of any agents. VDB workloads only impact the Delphix Engine and have zero impact on source applications, allowing customers to offload reporting and batch workloads such as BI extracts.

After the initial seeding, Delphix maintains synchronization by collecting incremental changes forever, eliminating all future full data loads from source environments. This includes repeated full backups, one of the largest loads on application systems and networks. In addition, the DataVisor applies changes to create “forever full” versions in the Delphix Engine, so versions can be made immediately available.

Deployment Enterprise applications are notorious for long, complex, and risky implementation cycles. Delphix deploys in minutes, can quickly synchronize data from a source, and delivers virtual data environments in a few clicks from that point forward. Since most enterprises take more than a week on average to deliver a single data environment, Delphix can accelerate application projects already in flight and begin to save time immediately.

The Delphix Engine comes packaged as a virtual appliance that can be installed anywhere a business needs: on physical infrastructure, in remote data centers, in private or public clouds, or in outsourced environments. With the DataVisor, Delphix can efficiently maintain synchronization across all locations, providing flexibility for today’s application portfolios and a future-proof data delivery solution.

Delphix currently supports Oracle and Microsoft SQL Server databases and data warehouses on a range of operating systems (Linux, Windows, HP-UX, AIX, Solaris) and can run on any storage (EMC, NetApp, HDS, flash arrays, cloud storage, and so on). Published Web services drive both a Web management interface and the command-line interface, enabling seamless integration with existing automation workflows.

Most software vendors and IT organizations focus on the live, production application environment. Delphix focuses on enabling future states, delivering data across time and location to accelerate application lifecycles. While applications will continue to evolve and their locations will continue to change, data must be preserved across all current and future changes. Delphix’s Agile Data Platform is designed to fill that need for any new project an organization can imagine.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, “How database virtualization works,” was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.