Wanting to Know Everything

by Kieran Healy on May 11, 2006

The NSA has assembled a gigantic database of telephone calls in the United States, with the help of all of the major telecommunications providers (except Qwest). The database is not of voice recordings, but of calls made. It constitutes data on a huge network of ties between people who call each other. In recent years, sociology and related fields have seen a lot of development in dynamic modeling of social networks, and in fast algorithms for analyzing large, sparse graphs. Entities with this kind of structure include things like the Internet, or AOL’s instant messenger network, and the universe of telephone calls within the United States. Some of the papers in this edited volume, published by the National Academy of Science, give a sense of what people are doing. (The volume was co-edited by my colleague Ron Breiger.) For instance, you can read about Data Mining on Large Graphs, Identifying International Networks, the Key Player Problem, and the use of MTML models to study adversarial networks. I think it’s fair to say that techniques of this sort are of significant interest to the intelligence community.

Social scientists, in the normal course of things, are severely limited in the amount of good data they can collect on networks of this sort. The Internet Movie Database has proved a very useful source of data for developing theory and methods in this area because it’s comprehensive and publicly available. Other researchers have set out to collect very large datasets describing some network structure together with the attributes of the people in it. A recent paper by by Gueorgi Kossinets and Duncan Watts, for example, analyzes all the emails sent over the course of a year by 43,000 students, faculty and staff at a large private university. But the traffic analyzed in that paper is just a drop in the ocean of the real flow of communication that travels by voice and email every day.

Social network analysts—in fact, any social scientist who works with quantitative data—often dream of ideal datasets. The kind of thing we would collect if money, time and ethics did not constrain us. When we daydream like this, our thoughts tend toward harmless megalomania: maximally comprehensive data on the whole population of interest, in real-time, with vast computing power to analyze it, and no constraints on updating or extending it. And a pony, too. At the limit, something like Borges’ map is what we want, a perfect, one-to-one scale representation of the world.

Scientists and spies are not so different. The intelligence community’s drive to find the truth, to uncover the real structure of things, is similar to what motivates natural or social scientists. For that reason, I can easily understand why the people at the NSA would have been drawn to build a database like the one they have assembled. The little megalomaniac that lives inside any data-collecting scientist (“More detail! More variables! More coverage!”) thrills at the thought of what you could do with a database like that. Think of the possiblities! What’s frightening is that the NSA is much less constrained than the rest of us by money, or resources, or—it seems—the law. To them, Borges’ map must seem less like a daydream and more like a design challenge. In Kossinets and Watts’ study, the population of just one university generated more than 14 million emails. That gives you a sense of how enormous the NSA’s database of call records must be. In the social sciences, Institutional Review Boards set rules about what you can do to people when you’re researching them. Social scientists often grumble about IRBs and their stupid regulations, but they exist for a good reason. To be blunt, scientists are happy to do just about anything in the pursuit of better knowledge, unless there are rules that say otherwise. The same is true of the government, and the people it employs to spy on our behalf. They only want to find things out, too. But just as in science, that’s not the only value that matters.

{ 1 trackback }

X-Tra Rant » Words of Wisdom
05.12.06 at 1:40 pm

{ 45 comments }

1

Yentz Mahogany 05.11.06 at 1:31 pm

The way I have justified to myself research in social science is that the more I know about it, the more I can warn others about its potential dangers. I can just imagine a spook looking at a graph that looks something like that on the ‘adversarial’ link, pointing at node number 8, and saying ‘take him out’. Who would have thought that silly little dots and lines representing nodes and connectives could possibly be used as a threat to liberty when put in the hands of spooks?

This view is a bit too X-Filesy, though. There are potential positive benefits. This sort of modelling can also be used by the public and reporters in order to make use of governmental transparency to see who pulls whose strings within the state, to identify and shine the light on the behind-the-scenes wire-pullers who routinely corrupt already shaky democratic institutions. It already has, to a rudimentary extent, with sites like theyrule.net.

2

Jonathan 05.11.06 at 1:35 pm

Does the IMDB provide automated access to their data? I had an idea about the average running length of Hollywood films I wanted to test in graduate school using their data, but it seemed unmineable.

3

tom brandt 05.11.06 at 1:44 pm

The link to the USA Today article is broken.

4

Kieran Healy 05.11.06 at 1:50 pm

fixed, thanks.

5

Tyrone Slothrop 05.11.06 at 1:52 pm

I understand the thought, but the NSA surely is constrained by money and resources. Compare, e.g., NSA salaries to what their recruits can make elsewhere.

6

boo boo kitty fuck 05.11.06 at 2:04 pm

This new layer of bureacratic control defends the layer below it (which defends the layer below it (which defends the layer below it (which defends the layer below it (which defends the layer below it (which … (Founding Fathers.))))

7

Kieran Healy 05.11.06 at 2:23 pm

let’s say “much less constrained” then.

8

M. Townes 05.11.06 at 2:25 pm

The “Key Player Problem” and “MTML” links go to the same page – intentionally?

From what I understand, the NSA is considered an attractive, even sexy job by many computer science grads, despite the pay differential. Likely it has to do with the kinds of “thrills” KH mentions in the post.

9

bi 05.11.06 at 2:29 pm

I agree with Yentz Mahogany. And for those who aren’t too obsessed with the White House, such analysis can help people identify relationships between shady organizations — like, say, scammers — which isn’t a bad thing.

This sort of traffic analysis can be foiled by a Mixmaster-like scheme, though.

10

SamChevre 05.11.06 at 2:32 pm

As far as “not constrained by law”–so far as I know, making the database the NSA did is legal as far as anyone knows. (That’s Mark Kleiman’s opinion, and he is not a fan of the Administration, and it fits my memory as well that you don’t need a warrant to get who called who when data–I’m waiting for Orin Kerr’s analysis).

Should the law be changed? Probably. As in many other areas, the real scandal is what is legal.

11

SamChevre 05.11.06 at 2:34 pm

OK, Orin Kerr’s analysis is up. His analysis–probably constitutional, but not legal.

12

Steve LaBonne 05.11.06 at 2:38 pm

If they themselves thought it was legal, they wouldn’t have denied Qwest’s request that they take it to the FISA court (which is a notorious rubber-stamp, which reinforces the point.)

13

Kieran Healy 05.11.06 at 3:10 pm

The “Key Player Problem” and “MTML” links go to the same page – intentionally?

No, fixed now. Your self-correcting blogosphere at work.

14

abb1 05.11.06 at 3:14 pm

There are potential positive benefits.

Of course there are potential positive benefits. Virtually all elements of a police state have potential positive benefits, no question about that.

15

Seth Finkelstein 05.11.06 at 3:15 pm

So, are you saying social scientists should go work for the NSA, as a dream job? :-)
Even considering it’s government pay, it’s probably better than social scientist academia pay.

Maybe someone should try to get them to outsource the data, under some sort of secrecy agreement? :-)

16

Barry 05.11.06 at 3:32 pm

“I understand the thought, but the NSA surely is constrained by money and resources. Compare, e.g., NSA salaries to what their recruits can make elsewhere.”

Posted by Tyrone Slothrop

Aside from the sexiness aspect, the NSA is supposed to have a very, very large black budget. That’d allow for lots of highly-paid contractors.

17

jet 05.11.06 at 3:59 pm

This right to privacy thingy is kind of new and I’m not surprised it isn’t much of a hurdle to the .gov. Without a privacy amendment to the Constitution, the NSA isn’t really doing anything illegal.

I’m just surprised that the NSA charter allows them to run data analysis on US citizens.

18

Drm 05.11.06 at 4:53 pm

NSA probably could out source the analysis for academic research by aliasing all of the numbers appropriately. An interesting game would be to see if you could identify yourself in the dataset.

19

perianwyr 05.11.06 at 5:06 pm

If you want to meet lots of hot military women, the NSA is your place. No, seriously.

20

Simstim 05.11.06 at 5:11 pm

Hmmm… the National Academy of Science… NAS… seems rather like… NSA!

21

Ginger Yellow 05.11.06 at 5:31 pm

“Without a privacy amendment to the Constitution, the NSA isn’t really doing anything illegal.”

Jet, I suggest you look at this post.

22

SqueakyRat 05.11.06 at 5:59 pm

“I’m just surprised that the NSA charter allows them to run data analysis on US citizens.”

Gee, Jet, do you suppose maybe it doesn’t? Or would that just be too much of a surprise?

23

Jon H 05.11.06 at 6:01 pm

So, like, if everyone started calling the FBI every week to ask if Bob is there, would that mess up any data mining pattern-searching they’re doing?

24

jet 05.11.06 at 6:07 pm

Ginger Yellow,
So apparently Artile II of the Patriot Act allows the President to do whatever he wants as long as he’s chasing terrorists. Or at least that’s the argument he’s going to make. The Patriot Act really is as scary as I thought it was going to be.

25

Ginger Yellow 05.11.06 at 6:21 pm

We know he’s going to argue that (assuming you really mean Article II of the constitution rather than the Patriot Act). He always does. But it doesn’t mean he can get away with if it gets to the Supreme Court. Bush and Gonzales argue a lot of things, but they don’t like putting them to the legal test.

26

Tom T. 05.11.06 at 6:29 pm

Regarding the job appeal, I’ve heard that the NSA is the largest employer of mathematicians in the United States. Dunno about social scientists, though.

27

jet 05.11.06 at 10:41 pm

So if law enforcement doesn’t need a warrant to get this information, how big a blow is it to civil liberties if the NSA gains access to this information? Doesn’t the FBI already have systems like this in place like Echelon?

The NSA doesn’t really bother me, but could anyone image the horror if the DEA put together a system like this? Or even worse, the MPAA?

28

Kieran Healy 05.11.06 at 10:55 pm

Or even worse, the MPAA?

“Our efforts are focused on links to Al Pacino and his known affiliates …”

29

bi 05.12.06 at 12:38 am

abb1: read the rest, dang it.

And seems for some people “police state” means “any state where the police can actually bring criminals to justice effectively”. So if there’s bad, then what’s good? A state where criminals routinely go scot-free?

30

mitchell porter 05.12.06 at 1:00 am

If you want to meet lots of hot military women, the NSA is your place.

I guess the trick is to use the right combination of keywords, so their bots will flag you for personal attention. Let’s see… Tsirelson duality! Aerovores! Reflective decision theory! That should be sufficient.

31

Doctor Memory 05.12.06 at 1:53 am

Samchevre, Jet: please, please stop. You most certainly do need a warrant to get records of incoming and outgoing calls from a telephone. It’s called a pen register (the name dates back to the days of the telegraph), and for an actual policeman to get one on a phone or group of phones during the course of an investigation is a small mountain of paperwork.

32

Todd Larason 05.12.06 at 2:52 am

The IMDB (having grown out of the old Usenet movie database project) continues to make huge amounts of data available; I haven’t read their licensing terms closely recently, but at a glance they seem to be “don’t use our data to compete with us”, basically.

See http://www.imdb.com/interfaces for pointers to both the data and the licensing information. You’d want at least running-times.list.gz; what else you’d want would depend on what exactly you were studying.

33

abb1 05.12.06 at 3:19 am

Bi, I don’t disagree with you in general.

On the axis going from total anarchy to total police state there must be a point where you feel most comfortable and this point is different for different people and for each person under different circumstances.

For example, I would like them to register all firearms and have a database of that.

National ID card? Maybe.

A GPS tracking device in every car? In every skull? There is a point where this will make you uncomfortable even if it provides great benefits for all sorts of things.

34

Harald Korneliussen 05.12.06 at 3:32 am

Using such a data set, wouldn’t it be feasible to find out for instance who are the most effective sources of unconvenient information?

I imagine a popular website editor suddenly getting troubles at work, being accused of crimes (with good evidence turning up), while the hysterical site that the NSA knows has little or negative impact gets left alone.

Or it could be used the other way around: find out who the influential people on your side are, and make sure they have everything they need.

35

Harald Korneliussen 05.12.06 at 5:00 am

Inconvenient, sorry.

I assume some of you, as social scientists or perhaps computer scientists, know what can be extracted from such a huge graph, and what can not. From what I’ve seen of the news coverage, they know and care nothing about the data mining potential.

That people understand this is more important than discussing potential legal strategies. There’s always some way that government can find a bizarre legal strategy to justify its actions, as long as they have the public support – their history shows this. But I don’t think they will have that public support once people understand that with that database, the president can probably uncover half of all the marital infidelities in the US, if he wants to.

36

Evan Morris 05.12.06 at 6:30 am

What no one seems to be asking is this: leaving aside the ethical issues, of what value is this database in preventing terrorism? I would guess not much. In fact, none. Maybe I’m dumb, but I can’t see why this database would be necessary in efforts to prevent terror, by which I mean there is nothing that having this database allows that a simpler exercise would not allow, in respect of combatting terror. It just seems like frenetic make-work to me, the NSA trying to pretend it’s doing something useful when in fact it is just masturbating.

37

Harald Korneliussen 05.12.06 at 7:15 am

Evan Morris: on the contrary, it can be very useful to catch terrorists. If you have one known terrorist, you can find all the people who have spoken to him, all who have spoken to them again, and so on until the tenth degree. It should then be quite possible to notice a strongly connected “ring” of people talking to the terrorist and each other, and possibly with pointers to other “rings”, which could be terrorist cells.

The problem is that by the time you get to the tenth degree, you have likely covered the entire population of the USA. And the system can’t just be used to discover terrorist cells who use the phone network, it can also be used to discover all sorts of other groups. Plus, it can identify the central people in these groups. If Bush wants to find out who were the really important people behind the latest immigration law protests, can he use this database for that? You bet. He can identify crucial links, important people who perhaps don’t even know themselves that they are important. He can identify all sorts of decision makers and leaders, on all levels, as long as they use the phone network as a primary means of communication.

But you may be right, in that terrorist cells probably aren’t so stupid they use the regular phone network for communicating. It’s honest people who do that. And Bush hasn’t bothered with warrants. That is another sign that it may well be honest people he wants the NSA to tap in on.

Switch to encrypted net connections, and impeach him, before it’s too late.

38

Steve LaBonne 05.12.06 at 7:19 am

Not much in the so-called Patriot Act really had anything to do with counterterrorism, either. It was a wish-list of new powers for which the snoops had been pushing for years; 9/11 just provided the political cover to get them passed. The bottom line is, snoops just like to snoop. It’s what they know how to do and what justifies their salaries. They’er always looking for new places to stick their noses. Of course they don’t know what to do with the information they gather- the failure to prevent the eminently preventable 9/11 attacks shows that.

39

SamChevre 05.12.06 at 8:13 am

Dr Memory,

I corrected myself–please see post #11. (The warrant requirements for a pen register are lower than for a wiretap–I misremembered and thought they were non-existent.)

SamChevre

40

Evan Morris 05.12.06 at 8:50 am

Harald: If you have a ‘known terrorist’ then you can just monitor the known terrorist’s communications, and then expand your monitoring as needed. The law already provides for this. That’s my point. You don’t have to monitor the whole database, there’s a simpler solution. So monitoring the whole database smacks of pretense. “Look, we’re doing something important, we’re monitoring everyone’s phones.” Well, great, but so what? You could achieve the same effect in respect of terrorists with less effort.

41

Ginger Yellow 05.12.06 at 9:16 am

Evan, you’re right that they could (and should) operate by tracking (after warrants with probable cause) suspects’ communcations and expand as necessary, but if you’re an espionage agency then you’re not going to pass up any opportunity to get more, and more convenient, data.

42

eweininger 05.12.06 at 9:33 am

If the legal genuises amongst us would please hurry up and explicate a theory of the Unitary Professoriate, I could finally get this damn IRB off my back. Unencumbrance beckons!

43

steve duncan 05.12.06 at 10:21 am

Discussions of this type of intrusion inevitably bring out the “I don’t have anything to hide” crowd. Ceding the 4th Amendment comes so easy to many people. Ask them if they’d like glass walls on their houses. Nothing to hide, right? How about mailing a list to every household of every stop all their cars made the previous month. Dad gets mom’s list and vice-versa. We’ll just tag every car with a monitor and let everyone know all its travels. Nothing to hide, right? How about a high powered microphone by the watercooler at work. Nobody ever gossips or says anything untoward about the boss. Nothing to hide, right? Let’s see, we’ll publish on the internet everything you view on pay cable, a monthly summary accessible to all. Nothing to hide, right? Where do we stop? We don’t, not until each of our daily lives are akin to the amoeba under the microscope, every undulation of its protoplasm noted and recorded. Happy monitoring!!

44

jet 05.12.06 at 1:51 pm

If you really care about your privacy, then donate your time (or money) giving power back to the people.

45

goatchowder 05.13.06 at 4:11 am

I don’t buy the “nothing to hide” horseshit.

Who has had an affair? Who has masturbated? Who has had oral sex in a state where it is illegal? Who has visited a prostitute? Who has smoked a joint? Who has driven home with a BAC > 0.05? Who has had a racy phone conversation with a lover? Who has flirted with a married person? Who has said nasty stuff about their boss? Who has made a racial or ethnic slur? Who has looked at porn? Who has run a stop sign? Who has cheated on their income tax? Who has exceeded the speed limit?

Bring me a hot cup of Kafka.

Comments on this entry are closed.