HP Vertica played a major role, as did an org structure that centralized analytics and lowered barriers between teams You may have heard how statistical wizard Nate Silver predicted the electoral votes for each state in the 2012 presidential election, showing that raw data crunching of polls is much more reliable than traditional punditry. What you probably haven’t heard is how the Obama campaign built a 100-strong analytics staff to churn through dozens of terabytes of data with a combination of the HP Vertica MPP (massively parallel processing) analytic database and predictive models with R and Stata to gain a competitive edge.Credit for the big data approach goes to Obama campaign manager Jim Messina, who decided to dive headfirst into an analytics-driven campaign. Messina commented, “We were going to demand data on everything, we were going to measure everything…we were going to put an analytics team inside of us to study us the entire time to make sure we were being smart about things.” To ensure everything was measured, staff were evaluated on whether they entered data. The mantra became: “If you didn’t enter the data, you didn’t do the work.”[ Download InfoWorld’s Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Harness the power of Hadoop with InfoWorld’s 7 top tools for taming big data. ] Boots on the groundOf the 100 analytics staffers, 50 worked in a dedicated analytics department, 20 analysts were spread throughout the campaign’s various headquarters, and another 30 were in the field interpreting the data.Chris Wegrzyn, director of data architecture for the Democratic National Committee, described the challenges, opportunities, and path to build the analytics-driven campaign. Wegrzyn noted that the key measurements centered on the data itself, modeling, and experimentation. The core data contained the facts about the electorate and the campaign operation. Modeling was used to understand the electorate at the individual voter level. Finally, evaluating the results of experiments helped the campaign learn how its actions actually influenced people. Of course, the key performance indicator for the campaign was the number who planned to vote for Obama, divided by those who planned to vote overall. The campaign understood there were three levers to maximize that number: registration, persuasion, and turnout. They had to encourage their target audience of voters to register, persuade the undecided to vote for Obama, then do all they could to ensure that Obama voters would show up to vote on Election Day.Marshaling the troops To appreciate the challenges, it’s important to understand how the campaign was organized into different teams. The field team was the personal face of the campaign: the people on the ground organizing volunteers, handling registrations, encouraging turnout, and so on. The digital team was responsible for online presence, email campaigns, online fundraising, social media, and more. The communications and media teams were responsible for Obama’s personal messaging with interviews, ad buying, and so on. Finance focused on the overall campaign fundraising strategy.In the past, all these departments had used sophisticated analytic technologies — but had implemented their individual analytic approaches independently. The 2012 campaign changed all that. The right people and mandates were important to make a unified analytics environment a reality. Executive buy-in from the campaign manager Messina was essential; without that authority, any ambitious initiative might have been sidestepped or dropped altogether. In addition, the core team had strong analytic experience from previous campaigns — and highly talented analytic staff hired at well below the market rate.The campaign set out to create, as Wegrzyn described it, “an analyst-driven organization by providing an environment for smart people to freely pursue their ideas.” The emphasis was on accommodating smart analysts, rather than hard-core engineers. A SQL-based environment was deemed friendly enough for analyst needs, rather than, say, an environment that required knowledge of Java or statistical analytics. In addition, the platform needed enough horsepower to enable analysis “at the speed of thought.” But the organizational objective may have been the most important factor of all, where barriers between disparate data sets — as well as between analysts — were lowered, so everyone could work together effectively. In a nutshell, the campaign sought a friction-free analytic environment.Campaign engines With these goals in mind, the team considered a number of approaches. They realized that while Hadoop was an important complementary technology, it required highly technical skills and was not designed for the real-time queries the team needed. They also realized that a large analytic appliance, which they used in previous campaigns, would not scale out sufficiently. Ultimately, the team settled on HP Vertica. It was SQL-based, affordable, and scalable, as well as a strong performer in proof-of-concept tests. On the statistical analytics side, the team used R and Stata.A cornerstone of the environment was its ability to grow. The environment was built with a feedback loop that became increasingly powerful the more it was used and tweaked. While the initial raw data was modest in big data terms — around 10 terabytes — the analysts generated dozens more terabytes beyond that through aggregation and experimentation.Analytics in action Two important initiatives during the campaign illustrate the power of the environment to gain greater efficiencies: AirWolf and Media Optimizer. AirWolf was built to integrate the field and digital teams’ efforts. A common problem in prior campaigns was that the field teams’ actions, such as recording a person’s particular interest in voting issues, could not be easily followed up by the Digital team (for example, with email correspondences). With AirWolf, when a voter was contacted by the field team in a door-to-door campaign, that voter’s particular interests were recorded and fed back to Vertica. Then the digital team ran email blasts from the local organizer to voters, each corresponding to a voter’s favorite campaign issues. This greatly enhanced the ability to pinpoint messaging and make it more feasible to sway voters.The intent of Media Optimizer was to enable much more targeted ad purchases. Prior to Media Optimizer, TV ad buys were based on broad demographics, which is both costly and inefficient. With Media Optimizer in place, the campaign could use statistical analysis to identify the target voters in the DNC database. Next, the voter data was enriched, both with demographics data from TV ratings as well as advertisement pricing data. Finally, the results were fed back into Vertica and reanalyzed for further tuning.With the overall picture combining likely voters for Obama, the shows they watch, and the prices of the ads — as well as the analysis feedback loop — it was much easier to determine the most efficient ad buys. One result was that the Obama campaign purchased twice the number of cable TV advertisements as the Romney campaign, many during niche programs, aimed at the precise demographic slices the Obama campaign was trying to reach. Campaign lessons All the analytic solutions shared a number of attributes: They were a combined effort of both analysts and engineers. They were time-sensitive, implemented in weeks rather than months. They were built around an unconstrained, yet centralized environment with Vertica.The analyst-driven organization empowered the team to achieve a number of key objectives. First, all the data from the disparate teams was brought together within Vertica, enabling a 360-degree view of the data. Second, analysts could answer nearly any question quickly and easily, no matter where the data originally came from. Finally, the platform was continually improved thanks to its built-in feedback loop.With the success of this initiative, a unified big data analytics environment is sure to take its place as a standard requirement for campaigns to come. This article, “The real story of how big data analytics helped Obama win,” was originally published at InfoWorld.com. Read more of Andrew Lampitt’s Think Big Data blog, and keep up on the latest developments in big data at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter. Data ManagementBusiness IntelligencePredictive Analytics