by John Haddad

How to build a big data supply chain

analysis
Apr 16, 20148 mins

To get the most from big data, you must marshal new infrastructure and develop new collaborative processes. John Haddad of Informatica provides salient examples

The bigger big data gets, the more challenging it becomes to manage and analyze to deliver actionable business insight. That’s a little ironic, given that the main promise of big data is the ability to make better business decisions based on compute-intensive analysis of massive data sets. The solution is to create a supply chain that identifiess business goals from the start — and deploy the agile infrastructure necessary to make good on those objectives.

In this week’s New Tech Forum, Informatica senior director of product marketing John Haddad details four common use cases that help illustrate how a properly constructed big data architecture can deliver results in the real world. — Paul Venezia

For decades, IT has relied on conventional business intelligence and data warehousing, with well-defined requirements and pre-defined reports.

In the new world of big data analytics, discovery is part of the process, so objectives shift as new insights emerge. This requires an infrastructure and process that can quickly and seamlessly go from data exploration to business insight to actionable information.

To swiftly transform data into business value, a big data architecture should be seen as a supply chain that can manage and process the volume, variety, and velocity of data. To get started, every company needs a big data process. That process is divided into three steps:

1. Identify business goals No one should deploy big data without an overall vision for what will be gained. The foundation for developing these goals is your data science and analytics team working closely with subject matter experts. Data scientists, analysts, and developers must collaborate to prioritize business goals, generate insights, and validate hypotheses and analytic models.

2. Make big data insights operational It’s imperative that the data science team works in conjunction with the devops team. Both groups should ensure that insights and goals are operational, with repeatable processes and methods, and they communicate actionable information to stakeholders, customers, and partners.

3. Build a big data pipeline The data management and analytics systems architecture must facilitate collaboration and eliminate manual steps. The big data supply chain consists of four key operations necessary for turning raw data into actionable information. These include:

  • Acquire and store: Access all types of data from any platform at any latency through adapters to operational and legacy systems, social media, and machine data, with the ability to collect and store data in batch, real-time and near-real-time modes.
  • Refine and enrich: Integrate, cleanse, and prepare data for analysis, while collecting both technical and operational metadata to tag and enrich data sets, making them easier to find and reuse.
  • Explore and curate: Browse data and visualize and discover patterns, trends, and insights with potential business impact; curate and govern those data sets that hold the most business value.
  • Distribute and manage: Transform and distribute actionable information to end-users through mobile devices, enterprise applications, and other means. Manage and support service-level agreements with a flexible deployment architecture.

Once the process is established, the big data reference architecture can support these four common big data use case patterns, which enable actionable business intelligence: data warehouse optimization, 360-degree customer analytics, real-time operational intelligence, and managed data lakes.

Data warehouse optimization As data volumes grow, companies spend more and more on the data warehouse environment. The problem arises when capacity in the environment is consumed too quickly, which ultimately forces organizations into costly upgrades in storage and processing power.

One way to cope with high-volume data growth is to deploy Hadoop, which presents an inexpensive solution for storing and processing data at scale. Instead of staging raw data that comes from the source systems into the warehouse, simply store original source data in Hadoop. From there, you can prepare and pre-process the data before moving the results (a much smaller set of data) back into the data warehouse for business intelligence and analytical reporting. Hadoop does not replace the traditional data warehouse, but it provides an excellent, complementary solution.

360-degree customer analytics Most companies want to understand their customers better to increase loyalty and retention — and upsell products or services. To do so, you need to develop a 360-degree view of the customer.

CRM software has long claimed to do this. Today, however, new types of data about individuals abound via social, mobile, and e-commerce channels — as well as customer service records, telematics, sensor data, and clickstream data based on Web interactions. A true 360-degree view now means you must be able to access new data types along with traditional data ones, combine them, transform them, and analyze everything to discover new insights about customers and prospects.

This greater level of understanding, combined with big data algorithms for predictive analysis, enables organizations to predict customer behavior more accurately and provide meaningful recommendations. Knowing your customers better, including what they are saying and doing, enables you to deliver more value to them.

Real-time operational intelligence Real-time operational intelligence is the ability to monitor and (optimally) respond to events in real time. An example of this in sales or marketing is known as “marketing to the moment.” For example, via mobile device, a sales associate could be provided with information about a customer as soon as he or she walks into the store, including that customer’s recent experiences on the store’s e-commerce site.

Another example where real-time operational intelligence is especially important is in fraud detection. With more types of data — whether generated through online behavior, social interactions, or transactions — you can start to identify patterns that would have remained obscure before. When such data is collected in real time, companies can use predictive analytics to flag fraudulent events with greater certainty and avoid false positives.

Yet another example would be predictive maintenance. Cars now have more software embedded in them than in the past. Through sensor devices, manufacturers can collect information and predict the mean time to failure, as well as more easily inform customers when they should bring a car in for a service visit.

Similarly, for aircrafts, companies typically prefer to perform on-wing repairs, which are less costly than sending the aircrafts to the service facility. By collecting data that better indicates when minor service is needed, companies can preempt major repairs and reduce maintenance costs.

Managed data lake The more data you have, the better you can develop a 360-degree view and operate in real time. But this can also be a double-edged sword. Data is cumulative and huge volumes are created when new data types are added.

Older companies in particular have large quantities of data on legacy systems, as well as mobile and social data that can potentially be used to extract business value. In many cases, you aren’t sure what you want to do with the data just yet, but you know there’s potential — and you don’t want to lose that potential by throwing the data away. Instead, you want to store it cost-effectively, so you can access it to discover new insights and trends.

This massive repository is called a data lake — and must be properly managed or you end up with a swamp. Managed data lakes enable you to store all types of data at scale over the years for processing and analysis at petabyte scale. But even that is not enough. The data must also be be easy to search, cleanse, and govern while observing whatever privacy policies may be in place.

In addition, you must ensure the data is highly reliable and available. You need to make it easy for information consumers to prepare and analyze it and make it useful. The final step is to operationalize the insights that you discover in the data lake to create new products and services, improve customer service, and sharpen decision-making.

The business benefits of well-managed big data go beyond these four use cases, of course, but they provide good illustrations of how you can make big data work for your business. By becoming more data-driven, you can shorten the path to achieving your business goals.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, “How to build a big data supply chain,” was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.