by Ben Werther

Platfora CEO: How we deliver sharper analytics faster

analysis
Aug 14, 20138 mins

CEO Ben Werther tells how Platfora's Hadoop-based solution upends old-fashioned, inflexible business intelligence processes

Big data is quickly becoming an important part of large-scale business operations. But it’s never been very quick itself, due to limitations on how that data is stored, manipulated, and retrieved.

Platfora has developed a solution for big data users that provides business analysts with self-service access, rather than requiring IT to maintain fixed-purpose reporting and analytics, which may fail to deliver the information businesses need to make timely, effective decisions.

In this edition of the New Tech Forum, Ben Werther, CEO of Platfora, gives us a look at how big data can be used for agile business intelligence, without the traditional hangups and sluggish performance. — Paul Venezia

Bringing big data into focus with a better lens

Big data analytics today tends to suffer from an inherent contradiction: To gain competitive advantage, many companies are jumping on big data technologies, which enable them to process raw data in new ways — yielding sharper and much more timely business intelligence. Yet the traditional processes for extracting business intelligence from big data and sharing it throughout the organization are anything but fast.

Without question, the Apache Hadoop open source project has helped to advance big data analytics. Hadoop is massively scalable and provides a framework for distributed processing of massive data sets across clusters of computers using cost-effective commodity hardware. Hadoop’s flexible “schema on read” approach enables companies to define schema after data has been stored, instead of being constrained by the traditional database “schema on write” model. But Hadoop has limitations that must be overcome if businesses want to take full advantage of their raw data in all forms.

MapReduce was the original programming model used to process these large data sets in Hadoop. This required companies to hire MapReduce experts and/or train in-house IT staff to pull data out of Hadoop and into a legacy data warehouse. This approach is time-consuming as well as resource-intensive and does not provide subsecond response times required in production environments. Early adopters of the technology have also used Apache Hive and derivative technologies to connect to Hadoop by translating SQL-like queries into MapReduce — but the process is still slow and requires experts. Additionally, these necessary steps toward making Hadoop work for the organization often place significant burdens on IT teams.

The inflexibility and latency of big data analytics are particularly frustrating for business analysts, who are under pressure to deliver timely and actionable business intelligence to the organization. Not only do they typically have little or no control over the data analytics process, most don’t even realize how much valuable insight is likely being overlooked due to technology constraints. As the volume of semistructured and, increasingly, multistructured data — Web logs, mobile application server logs, tweets, Facebook Likes, audio files, emails, and more — continues to balloon, the situation will only worsen, yielding more frustration on all sides.

Rethinking the status quo of data analytics The mantra of data analytics has been the same for decades: Don’t build a data warehouse until you know the questions you want to ask of your data. Data warehouses store precomputed answers intended to respond to questions relatively quickly — the limitation being that only predetermined questions can be answered.

If the questions need to change, it’s impossible for business analysts to go back to the raw data to get answers to new questions or explore data beyond predefined parameters. Adding new data sets to a data warehouse also presents a challenge, as does making changes to an existing data set, such as adjusting the level of granularity (for example, from days to hours). Seemingly minor alterations like these can take weeks if not months to execute.

Today’s enterprises require a more flexible approach to performing big data analytics because:

  • The variety and quantity of data are growing massively.
  • Analysts can’t know in advance what questions they’ll need to ask of their data as the market, customers, and competitors change.
  • To answer the full range of unanticipated questions, self-service access must be provided to all of an enterprise’s raw data.
  • To stay competitive, businesses need to use their data in more ways than ever before.

Moreover, business analysts need to be empowered to manipulate data so that it can be shared with other people in the organization. In short, they must play a direct role in fostering collaboration around business intelligence. After all, business intelligence provides value to the company only if it can be used for business decision-making — and only if those decisions are made at the right time and by the right people in the organization.

An approach to enabling fast, unbounded BI Platfora has rethought the status quo of big data analytics. It begins with the Hadoop Data Reservoir (HDR), which is our vision for how Hadoop can be used effectively within the enterprise as a single, central repository where all enterprise data can reside. The HDR serves as both the storage and the source of data for what we refer to as self-service analytics. It provides processing for data preparation and advanced analytics, ultimately eliminating data silos and reducing costs.

The integrated platform we developed to support a new era of self-service analytics helps to remove the obstacles to business intelligence described earlier by enabling an “interest-driven pipeline” of data controlled by the end-user. The end-user — typically a business analyst — can access raw data directly from Hadoop, which is then transformed into interactive, in-memory business intelligence. There is no need for a data warehouse or for separate ETL (extract, transfer, load) software and the headaches described above.

Platfora’s integrated platform includes three key elements:

  • Data sets: Data sets form the foundational layer of the platform and are representations of raw data in Hadoop. They can be brought into Hadoop very easily and are cataloged and searchable in the Platfora Data Catalog. IT teams can maintain standard and commonly used data sets, and end-users can import their own data sets to join into the data model.
  • Lenses: Platfora automatically transforms massive datasets into highly responsive, in-memory Lenses, which function as on-the-fly data marts. Platfora’s high-performance query engine scales out horizontally across multiple physical or virtual servers by distributing these columnar-compressed aggregates of data. Essentially, Platfora drives the Hadoop cluster to generate Lenses modeled to answer questions being asked by end-users. If the data necessary to answer the questions is not in the Lens, the end-user can create a new one without intervention from IT. Lenses in the Platfora platform are personal in-memory data marts that have been pulled out of Hadoop and presented to the end-user; they are refined and updated based on the specific needs of the user. Lenses are automatically refreshed when new data is added to Hadoop.
  • Vizboards: The front end of Platfora’s integrated platform is a Web-based business intelligence application called Vizboards. Through Vizboards, users can analyze data from Lenses and organize relevant findings in a format that will be easily understood by stakeholders. Vizboards, which are based entirely on HTML5, function as Web-based canvases where business analysts can easily explore data and build visualizations, gaining insights and collaborating with others.

A new era of self-service analytics Platfora’s solution is designed to help businesses experience the increased dimensionality of their raw data. End-users can easily explore a broad spectrum of data or conduct fine-grained analysis, without having to start an entirely new analytics process. Additionally, organizations no longer have to worry about deciding up front what questions they will want to ask of their data later. Platfora also helps to support timely and efficient collaboration around business intelligence, which is important for businesses that want to exploit big data insights quickly for competitive advantage.

A core goal in designing the Platfora solution is to empower business analysts in enterprises to explore big data in ways not previously possible — or never even considered before. Our vision for self-service analytics is also driven by the ideal that business users should have access to all of the data they need, without limitation, and be able to explore and combine internal and external data sets to answer questions that have not been possible or practical in the past. At the same time, data administrators retain the flexibility to set access controls to ensure data remains secure, as appropriate.

When business analysts are able to flag and share insights and easily “contribute back” to the organization, the enterprise can achieve greater agility. Just as important, when IT is no longer burdened by big data demands — and these responsibilities become simple, highly routine processes — teams are free to focus on adding value to the business in other ways.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, “Platfora CEO: How we deliver sharper analytics faster,” was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.