simon_phipps
Columnist

Nude photos, phone records, NSA data offer essential lessons for admins

analysis
Sep 5, 20147 mins

Whether via Apple's iCloud, the DEA, or the NSA, data is leaking everywhere -- can anyone avoid exposure?

As you’ve heard many times by now, someone with no life or ethics appears to have hacked into numerous celebrity accounts on Apple’s iCloud service and copied private photographs wholesale. At least a few of those photographs are intimate and revealing. As if that juvenile intrusion on adult privacy wasn’t enough, they’ve then posted them in the Internet’s frat houses for the world’s sexually frustrated imbeciles to ogle.

This case raises questions about the very act of putting data online. There may be primary benefits for doing so, but as technology decision makers, we need to raise questions about secondary costs. Let’s consider additional data points.

[ Also on InfoWorld: Celebrities get phished, but the cloud gets blamed. | Track trends in open source with InfoWorld’s Technology: Open Source newsletter. ]

We also learned this week that the DEA has been using phone call data going back decades — stored by AT&T for any call in which it participated, not just for its customers — as a covert source in the agency’s investigations. Unlike the NSA data, this is not merely material relating to foreigners — this is everyone’s data, going back as far as 1987. It can be accessed by officials by filling out a form — called an “administrative subpoena” but not involving any judicial review. As the New York Times says:

The Hemisphere Project, a partnership between federal and local drug officials and AT&T that has not previously been reported, involves an extremely close association between the government and the telecommunications giant.

Was this usage what the developers or executives had in mind a quarter of a century ago as they started logging the data? Or has it been stored “just in case” because it existed and seemed valuable and over time has found more and more users? There must be an enormous database — the epitome of big data — and it’s probably used for multiple purposes.

As to that NSA data, a great deal of confusion about “surveillance” seems to be floating around. In the United Kingdom, questions are being asked about all the data-gathering by the British equivalent of the NSA, GCHQ. In response, Secretary of State Theresa May has responded that “there is no programme of mass surveillance and there is no surveillance state” and labels claims that GCHQ engages in unlawful hacking as “nonsense.” Yet clearly, a lot of data is being gathered.

GCHQ, the NSA, and probably every other intelligence agency worth the name is actively gathering data from the Internet. Everything on the Internet is transient, with different decay periods, so gathering information is a constant process. They believe everything that can be gathered without illegal action is fair game, so they gather anything and everything they can, storing it just in case.

They are without doubt capturing and recording all and any email, instant messages, Web pages, social media traffic, and so on. Recent disclosures reveal that the NSA collects “nearly everything a user does on the Internet,” then offers analysts tools to search that data. The NSA has a variety of explanations why it’s all legally gathered.

Some is public on open websites. Some is “public” in the sense that it is passing through unsecured intermediaries that anyone could theoretically observe. Some is private, but can be gathered because a broad interpretation of certain legal doctrines (“sent abroad” for example, when the service provider is in a different country to the originator) allows them to treat it as public. Intelligence agencies are thus slurping up enormous quantities of data in a wide range of protocols and contexts, far more than could ever be appropriate for any investigation.

Why are they doing this? Because otherwise the data would be lost by the time they knew they needed it. They are not actually looking at most of it, at least not straight away. All they are doing is making transient data persist — they are caching. They are not breaking any rules by doing so (according to their own legal outlooks). They are simply engaged in blanket data gathering to the limits of the legality they understand for their acts. The result is truly enormous data lakes.

To study the data is a different matter, in their view. According to the NSA’s legal advisers, “wiretapping” or “hacking” starts at the point a human being actually analyzes or interprets the data. The NSA’s XKeyscore tool provides such a capability for fishing in data lakes. The NSA claims that access to the lake is limited, but disclosures suggest it is limited by rule and the threat of audit and not actually by any technical means. As a consequence, agents have to consciously ignore out-of-scope results from tools like XKeyscore.

Using “metadata” is considered OK as it is simply the “public” aspect of the contents of the data lake. Metadata helps target the fishing more accurately, but it can also be used to “triangulate” and determine facts directly. It’s an open question whether using varied metadata to triangulate on private facts is surveillance. The British secretary of state is probably speaking the truth according to her chosen frame of definition (in the same sense as Bill Clinton’s statement “I did not have sexual relations with that woman” was true). Certainly, a well-considered system of rules makes her statements precisely true.

The distinction the intelligence agencies make is a useful one for the IT profession because the truth is you can’t dissociate data gathering from data usage. To make a proper risk assessment and calculate the return on the investment of filling a lake with big data, you need to account for all the costs, not just the ones associated with your primary goal. As author Quinn Norton is credited as saying, “In the end, all data is either deleted or public.”

The celebrities in the first story were probably thrilled their photos were being backed up and didn’t consider the possibility that Apple’s security mechanisms — or their password choices — would one day lead to an anonymous pervert posting their privates across the Web. The AT&T executives and engineers who first started collecting the network traffic information probably didn’t ever consider their work would end up in the hands of the DEA being casually used to catch kids with pot. And the NSA wants us to not ask too many questions what is going on with all the data they slurp off the Internet for future use. The DEA and NSA weren’t even going to tell us it was happening, despite their assurances that there is no illegal activity planned.

All this teaches us a design lesson. If we accumulate data, it is going to get used in the end. If we store that data in a place with public access, it will eventually become public. If we consider only the immediate application, we could be exposing ourselves to great risk in the long term. Storing data can have costs beyond the storage medium — costs associated with assisting legal investigations or satisfying discovery requests of litigants. Before we accumulate data indefinitely, we should always make an allowance for future abuses of the data, just in case.

This article, “Nude photos, phone records, NSA data offer essential lessons for admins,” was originally published at InfoWorld.com. Read more of the Open Sources blog and follow the latest developments in open source at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

simon_phipps

Simon Phipps is a well-known and respected leader in the free software community, having been involved at a strategic level in some of the world's leading technology companies and open source communities. He worked with open standards in the 1980s, on the first commercial collaborative conferencing software in the 1990s, helped introduce both Java and XML at IBM and as head of open source at Sun Microsystems opened their whole software portfolio including Java. Today he's managing director of Meshed Insights Ltd and president of the Open Source Initiative and a directory of the Open Rights Group and the Document Foundation. All opinions expressed are his own.

More from this author