by yves_de montcheuil

Securing big data, a cross-technology challenge

opinion
Nov 10, 20153 mins

The data lake has a real appeal for hackers. Organizations really need to secure their big data

As companies continue to embrace big data, more and more sensitive and regulated data is being collected and stored. And of course, all of this data becomes a high-value target for hackers for several reasons:

  • With so much data records from various origins being brought together in the data lake, the reward of breaking into it becomes very high, replacing the need to break into numerous individual systems to collect the same data.
  • The data lake contains not only raw data, but also enriched and reconciled data, which carry much more potential for malicious users to gain insight into the secrets of individuals or corporations.
  • Security technologies applied to big data in general, and Hadoop in particular, have yet to match their counterpart on traditional systems (ERPs, databases, etc.) and it makes it relatively easier for hackers to penetrate the data lake.

Securing big data requires the combination of specific but also very traditional security technologies, but it is also a process and policies issue. One of the top risks is the creation of new silos for application identity — separating big data security from the rest and soon creating a divergence between systems. The consequence being often runaway admin privileges but also constraints on an organization’s abilities to meet compliance or to mitigate risks.

As, like Vs, go in threes

It is key to integrate identity and access management across the full IT infrastructure — traditional systems and big data alike. Organizations must look into incorporating big data into what is often referred to as the 3 As of security — Authentication, Authorization, and Accounting:

  • Authentication is the way of identifying a user, for example through a user name and password combination.
  • Authorization is the process of enforcing policies that determine what activities, resources and services a user is permitted to perform or use.
  • Accounting measures the resources a user consumes and identifies deviations from typical behavior — often a sign that a user’s access has been compromised (stolen credentials, Trojan horse program, etc.).

NoSQL, Hadoop, HDFS & co.

Where this all becomes more complicated is because many technologies coexist in the data lake. Historically, Hadoop was simple: an HDFS storage layer and MapReduce processes running over it. Then, Spark became the most popular processing engine. Kudu was introduced to replace HDFS. More NoSQL databases were deployed on top of Hadoop — or alongside it.

Security standards for Hadoop also started to emerge — Kerberos for example. But newer big data technologies don’t all support the (sometimes self-proclaimed) “standard”.

Add to this the fact that many organizations already have security frameworks, enforcing the 3 A’s in their traditional systems. The last thing they want or need, is for big data to require a complete overhaul of their security technologies and policies!

There aren’t many options: Big data must integrate with existing technology frameworks. Or, another way to view this is that technology frameworks must evolve to integrate with big data technologies. And it’s not going to be a simple endeavor to keep up!

Yves de Montcheuil is a recognized authority on digital business trends and information management. A marketing executive with a track record at several successful IT vendors, he is also a strategic advisor for digital companies and runs the International Commission of Tech In France. Yves is a strong presenter, author, blogger, and social media enthusiast.

You can follow Yves on Twitter: @ydemontcheuil, or contact him via LinkedIn.

The opinions expressed in this blog are those of Yves de Montcheuil and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author