Eric Knorr
Contributing writer

Delphix CEO: Why database virtualization matters

analysis
Sep 26, 201117 mins

Creating a copy of a core database is typically a painful job. The CEO of Delphix, a database virtualization startup, claims it doesn't have to be that way

Certain gaps threaten to derail the notion of a supremely automated IT infrastructure. One of these is a convenient, secure way to leverage the highest-value information owned by a company: typically, its customer and product databases.

In most companies, databases get copied more often than you may realize. Developers need database copies for dev and test, and with internal app dev operations of any size, that may mean several copies. Plus, you need database copies for offline, processing-intensive applications such as business intelligence to avoid slowing the production database to a crawl.

[ InfoWorld has the scoop on what desktop virtualization really means. | Doing server virtualization right is not so simple. InfoWorld’s expert contributors show you how to get it right in this 24-page “Server Virtualization Deep Dive” PDF guide. | Keep up on virtualization by signing up for InfoWorld’s Virtualization newsletter. ]

But standing up database copies and managing them is highly resource-intensive — something Jedidiah Yueh, founder and CEO of Delphix, recognized years ago. Delphix, which launched just last September at Demo Fall 2010, provides a database virtualization platform that enables customers to create virtual copies that require a fraction of the storage space and human labor that full, conventional copies demand.

I interviewed Yueh last week when he visited InfoWorld’s offices. I began by asking him what prompted him to start Delphix, which — in its scant 12 months of existence — already claims to have 30 Fortune 500 customers.

Eric Knorr: You have a unique product, so maybe you could begin by telling us how you came up with the idea.

Jedidiah Yueh: Let me give you just a little bit about my background, because it relates to why we started the company. In 1999, I founded a company called Avamar. We pioneered data deduplication. That was really about the backup and recovery industry. The idea at Avamar was strip out all the redundancy in tape backups and pack it down onto the disk.

The challenge that we saw in the deduplication space was that databases would overrun our footprints. Databases are like brain surgery versus general-purpose surgery. You have to have very customized technology sets to attack the challenge in relational databases. So we knew we had a problem.

When I started talking to customers, to try and understand the challenges of the database world, it really wasn’t data protection that was driving all the redundancy and IT complexity. It was just the nature of maintaining applications sitting on a production database.

For every production database, businesses will on average create 9 to 10 development, testing, QA, and staging environments for all kinds of different projects, whether it’s to customize your application or to upgrade your database in order to do break/fix analysis, or to drive data before you go into a data warehouse, or to drive data into a BI deployment.

All of those needs of a business — and they’re myriad — require the creation of a duplicate environment because you can’t load everything onto production. Production is keeping your business afloat.

So there were really two kind of main challenges that were revealed in those discussions with customers. One was all those redundant copies of production database environments, up to 10 on average. The second one was that there was a lot of complexity in IT around dealing with these copies of data.

Knorr: So you’re not talking about scaling for a production environment; you’re talking about offline copies that you want to keep coherent with the production environment, right?

Yueh: Exactly. There’s a “more data” problem in that daily lifecycle maintenance, a 10-times expansion that people don’t realize they’ve been burdened with for a long time. And when you look at the workflows around creating these copies and refreshing them for the four or five projects that are constantly running in parallel through these enterprises, you see that there is a huge problem in the op ex component, a huge IT complexity.

For example, with VMware, you can stand up a virtual machine in a couple minutes. But if you go into an enterprise and ask, “How long does it take you to refresh a database or stand up a new database for any development environment?” usually that will take, say, four teams over four weeks.

So what we did as a company is we virtualized the data tier for databases. And a database really has two pieces. You’ve got the server component (where the executables run) and the data files themselves. We virtualize the data files themselves so that you can consolidate these 10 copies into about half the space of one, while they all appear like 10 full copies. You can refresh or provision new data environments in as little as three clicks through the self-service model.

It’s kind of like extending virtualization. The virtualization journey began at the CPU tier. But then businesses, as they roll out these private clouds, ask themselves: “How do I deal with where the real spending complexity is in my environment? My Oracle database is sitting on HP-UX, Solaris, or AIX, because those things aren’t moving any time soon.” So they only get partway on the virtualization journey because of the legacy environments and all the data — the mission-critical data in relational databases.

With our product, you drop our virtual client onto VMware or onto some x86 server, we connect to the legacy environment, and suddenly we can create n number of virtual copies of these databases that you can use for all these purposes, but they’re all consolidated and they’re all automated through software.

Knorr: So could you use them for other production purposes as well?

Yueh: Yeah, absolutely. Boeing Credit Union, for example … runs its ATM transaction reports from Delphix instead of from production. So what used to be a production workload has now been moved onto a virtual workload. In some ways, that use case is scale-out for the production environment because we now have a shadow environment that’s staying in lockstep that you can give different kinds of workloads to.

Knorr: Right. You wouldn’t want to do something heavy duty on the same copy and just bring everything to a crawl.

Yueh: Exactly. Nobody can get their money out of the ATM. It would be one of those disaster cases.

There are a couple of big industry trends. The growth of applications and the dependence on applications that drive efficiency for businesses is on a northbound track. It’s kind of inescapable and unavoidable. Businesses are grappling with the fact that they have to bend around social media marketing or embrace self-service and private cloud deployment. All of those projects eventually have to touch the data in your ERP, CRM, or Salesforce systems, which are all still on relational databases in most of the big companies in the world. So we just help complete part of that journey by virtualizing the data tier.

Knorr: Give me another customer example to illustrate what Delphix is good for.

Yueh: Procter & Gamble is a great example. They run SAP, and they create up to 34 copies of their SAP environment to support all of the programs, projects, initiatives that they have running.

The total environment is about 600TB of storage for SAP; 400TB of that is the lifecycle with “more data” component that we consolidate and virtualize. With our product, they can shrink 400TB to about 40TB. You usually get 10 to 20:1 consolidation ratios with our product. What’s more important is not just the hardware savings. It’s the fact that their IT teams now don’t have to spend that four weeks across four teams to create these environments or to refresh these environments.

Knorr: Well, that’s just it, keeping that coherent is the other side of it that gets out of control.

Yueh: Yeah. You get like a 20-times productivity time savings as well as about a 10- to 20-times hardware cost savings, so you get two pieces. And it’s really delivering those kinds of benefits [especially when] you’ve got all these legacy database systems on old platforms.

Another good example is StubHub, which bought the product. If you’re creating all these copies of your StubHub databases for development and testing and QA, you have a lot of copies that are no longer managed by your production database team that are floating around where all kinds of people can access data.

You can see the risks. Once you’ve moved off these separate copies, well, who’s in control of the user and privilege management? And who’s in control of creating additional copies or locking down these data sets? Nobody can see where all these copies are or when they should get retired. You just have this risk surface area floating around.

We virtualize all of it into one location. We do full audit logging, so you can run a single report and see every privileged user — access, refresh, creation — from one single place. There’s an element of it that’s about data control.

Knorr: You must have to get pretty deep into the eccentricities of all these different database structures to be able to virtualize them this way.

Yueh: Yes and no. Relational databases all have pretty similar structures in their files and in how they handle logs, so about 80 percent of the product is heterogeneous across all the databases, and then for each of the databases we do have to build some custom components.

A lot of that customization is the automation around standing up a database and dealing with the eccentricities and the little triggers that you need to do to change memory settings and this setting and that setting, so we do all that automation too. We started with Oracle support. Microsoft SQL betas should begin at the end of this year or beginning of next year, and we’ll end up picking up all the major relational databases.

Knorr: Right, but the point is you do need to specifically support them.

Yueh: Yes. You need to serially support all of them.

Knorr: What are some of the differences in virtualizing an Oracle database versus a MySQL database?

Yueh: For one thing, a lot of high-end enterprise features require some degree of customization to support. In our last release of Delphix Enterprise, we started supporting Oracle [X/EX] data, Oracle RAC, Oracle Data Guard, the stand-by kind of database copies, so we can connect to all of those and create virtual copies of all of those. In the Microsoft SQL world, for instance, you won’t present data files via the same protocols. You’ll use iSCSI or you might use Fibre Channel. You probably won’t use NFS, like you would in the Oracle world. There are some things that are protocol and some things that are kind of feature-specific. But that really becomes a lot of the kind of key differentiators as you roll out into the market to support all those features.

Knorr: I guess because you’re talking about the data layer here, there is no additional licensing cost?

Yueh: Not for what we create, but Oracle will still charge you the same for virtual or physical. What we really do is take out hardware infrastructure costs and IT complexity. The software charge is still what Oracle charges everybody, so that’s left alone.

Knorr: Right, and the customers need to make these duplicates anyway. They’d be paying that additional licensing fee anyway.

Yueh: They already have these servers sitting around that they’re paying for. We just provision to them quickly and efficiently.

Knorr: Although use of your product might encourage them to do this more often, I suppose.

Yueh: Yeah. It’s kind of like the VMware agility angle. VMware enables you to roll out more projects, create more virtual machines, more small isolated sandboxes for developers. The same thing happens with Delphix, so instead of having monolithic databases with 5 or 10 developers sharing them, now each developer gets their own virtual sandbox and they can work at their own pace. It’s just basically part of that overall trend of taking really inflexible, rigid, hard-to-provision systems and turning them into software that happens faster, smaller, and in a more agile fashion.

Knorr: How exactly does that play out in the real world?

Yueh: One of the things that you’ll want to do is to instantly create a virtual copy of your production environment. Another thing you might want to do, because your production environment keeps changing … is refresh the data. In Delphix we allow you to refresh data in place. We’ve created this virtual copy, but we can swap it out with fresh data from production in just a couple of clicks.

Or let’s say you’re rolling out your BI environment or rolling out a new set of features for SAP. A lot of times when you’re testing the code changes you’re making, you’ll change the makeup of the data in the data files, in the tables, as well as maybe make changes to schema. Then you find that — well no, that really didn’t work. I need to try something different.

Another thing that happens is not just the need to refresh, but also the need to roll back, to get data at an older point in time, because otherwise you just can’t move forward in development. You’ve already changed the makeup with your new code, so our product allows you to roll back that data.

Another thing that we help you do is create virtual copies from virtual copies. Sometimes, if I’m in QA, I don’t want a fresh copy from production. I need the developer’s copy because I’m testing the developer’s code and it goes with the schema changes in the database. This virtual copy from virtual copy is another feature that nobody would think about except at our tier and with this kind of software. I can create five virtual copies of your environment, and we can enable five different testers to do whatever they want to it.

Knorr: It sounds similar to snapshots at a backup level, as if you’re doing continuous versioning of these various environments, except you may have all kinds of permutations going at once. It sounds pretty complex to manage.

Yueh: One of the core principles that we believe in at Delphix is delivering enterprise-class products with consumer-grade interfaces. You can literally, with our product, provision a 10TB database and refresh it in as little as three clicks in our Web interface.

Knorr: Well, it’s interesting when you consider this in the context of the devops trend. Devops doesn’t really address the database layer.

Yueh: When you roll out devops, they all want to get to faster builds of the databases in those environments. Eventually you hit — oh, all my data is stuck in these Oracle databases. How do I get agility around that?

Knorr: I mean would you solve that in two phases? In other words, ops would say: OK, we’re going to virtualize these data sets, we’ll set it up for them so that they’re continually refreshed or whatever, and then dev can come in and self-provision from there.

Yueh: Yeah, you can absolutely set it up that way, although most of our customers still control it from the IT ops database team. But some of them, like Staples, they go all the way to self-service, in which case the developers create the environments they want. The product gives you the user’s role and privileges to enable either way — central control or full self-service.

Knorr: Do you see things trending more in that self-service direction on the dev side?

Yueh: I think self-service is really the huge overriding trend … enabling developers, business analysts, data analysts to have self-service access. But I think the first step for a lot of organizations in rolling out a product like this is to still allow IT to control it for the first phase — until IT feels comfortable and knows how to manage it. Eventually, they’ll just enable more and more services from us, but I see it kind of as a couple-step journey.

Knorr: Does that drive your product development, that long-term vision?

Yueh: Absolutely. Because we’re very focused on creating self-service interfaces that are just specifically for developers or specifically for data analysts. And you have to think about — what are the actions that each of these individuals need in their world? Because they don’t all need the same things. The developers really need the rollback feature because they’ll mess up their data environments with their QA tests or their unit testing.

Knorr: That’s their job.

Yueh: That is their job. On the business analyst side, they don’t want to roll back data, they just want fresh data to run their job. So they just want to be able to say — I want fresh data on this schedule and I want to keep it for this amount of time, and I’ve got a bigger priority than everybody else.

Knorr: What are some of the things that customers are asking you for?

Yueh: They ask us all the time for support for all the databases, especially in the banks, they want Sybase support and DB2 support and Microsoft SQL support, so that’s one of the main kind of priorities for us as a business is to get heterogeneous fast.

We’re learning quite a bit from our customers on how they’re using the product today. It was a demo less than a year ago, when we came out of stealth mode, but we’ve picked up 30 Fortune 500 accounts since then. We didn’t expect that we’d have so many big-name accounts right out of the gate.

Knorr: I would imagine they would want you to go deeper into the legacy stuff because that’s where their biggest headache is.

Yueh: That’s where, pound-for-pound, they spend the most money, they have the most rigidity, and where the applications mean the most to the business. The interesting thing is if you look at the market, Oracle has 300,000 database customers. It’s an incredibly big market. They do $30 billion in middleware and databases, and Larry Ellison said last year FY11 for them was the fastest growth in database software revenues in a decade. While there’s a lot of buzz around NoSQL and Hadoop and all that stuff, the growth of just the base relational database market is still way meatier than the growth in everything else. It’s a monster component of the market.

Knorr: Do you plan to stay within this domain? Is this your area of specialization? Is there an exit strategy for you in the future?

Yueh: It’s a no-exit exit strategy. We would love to build the biggest independent software company that we could. That is really the goal of the company. If you think about the really big waves that have hit IT over the decades, clearly the x86 server market and the PC market were monster opportunities. I think the second-biggest opportunity might have been the relational database market, which created Oracle, big chunks of Microsoft’s and IBM’s revenue.

Nobody has really found a way to take out the big cost and complexity in the database space, so that’s really what our software is designed to do. I would argue that the data trapped in relational databases might be the most important asset to virtualize in the enterprise data center.

This article, “Delphix CEO: Why database virtualization matters,” originally appeared at InfoWorld.com. Read more of Eric Knorr’s Modernizing IT blog, and for the latest business technology news, follow InfoWorld on Twitter.

Eric Knorr

Eric Knorr is a freelance writer, editor, and content strategist. Previously he was the Editor in Chief of Foundry’s enterprise websites: CIO, Computerworld, CSO, InfoWorld, and Network World. A technology journalist since the start of the PC era, he has developed content to serve the needs of IT professionals since the turn of the 21st century. He is the former Editor of PC World magazine, the creator of the best-selling The PC Bible, a founding editor of CNET, and the author of hundreds of articles to inform and support IT leaders and those who build, evaluate, and sustain technology for business. Eric has received Neal, ASBPE, and Computer Press Awards for journalistic excellence. He graduated from the University of Wisconsin, Madison with a BA in English.

More from this author