“Able Danger” and data mining

by Henry Farrell on August 27, 2005

Laura Rozen on revelations that Able Danger contractors lost their jobs after fingering Condoleeza Rice and William Perry as part of a web of relationships between China and US defence/security types.

bq. Able Danger’s data mining results seemed more all over the board, a kind of tinfoil hat producing adventure better left to freepsters and google?

Not necessarily so. There’s a lot of confusion about what data mining can and cannot do. Both its proponents (who want to get fundng for it), and its opponents (who want to conjure up images of Big Brother) have an interest in hyping up its capabilities. The fact that Able Danger or other data mining programs may throw up false positives doesn’t mean that data mining isn’t potentially useful. The _most_ that data mining can do (and should be expected to do) is sometimes to highlight interesting and non-obvious relationships that might otherwise have escaped people’s attentions. In the words of Mary DeRosa’s CSIS report on data mining and counter-terrorism (the best thing I’ve read on the topic), data mining may provide a set of ‘power tools’ for law enforcement and intelligence, which may suggest interesting further lines of investigation. Inevitably, however, it’s going to provide a lot of entirely spurious leads (indeed, if it doesn’t provide some dead-ends, its filters are probably set too narrowly). Thus, it shouldn’t be treated as providing smoking gun evidence the one way or the other – all that it does is to analyse sets of relationships in a network of actors, and highlight some relationships that might otherwise have been non-obvious.

So the important question isn’t whether Able Danger and related programs came up with some network connections that seemed on the face of it to be ridiculous (although in the unlikely event that the Able Danger people portrayed Rice as some class of a Manchurian candidate it would obviously be a serious problem). In order to figure out the underlying merits and defects of Able Danger, we’d need to have a lot more information than seems to be publicly available at the moment. How good was Able Danger _overall_ at filtering out the wheat from the chaff? What was the overall ratio of false positives to genuine positives? Was the data mining exercise that spat out Atta’s name (assuming that the Able Danger people are telling the truth) one of a whole bunch of data mining exercises, most of which came up with garbage? Did the specific exercise that came up with Atta’s name highlight him as playing a central role in the network, or at least a role that merited further investigation, or did it have him on the periphery of the network? At the moment, we simply don’t know enough to evaluate – instead, we seem to be in a wilderness of mirrors, with conflicting leaks from pro- and anti-Able Danger types, all with their own agendas. The quick take as best as I can make out – if Able Danger singled out Atta as one of a small group of individuals who merited substantial further investigation, then the Pentagon has a problem. If Atta’s name was one of hundreds or thousands, the rest of whom were mostly false positives, or if the network analysis didn’t highlight Atta as someone who merited further investigation, then the Pentagon’s decision to close down the program is far more easily defensible _ex post_.

{ 1 trackback }

riting on the wall » Blog Archive » more data mining: 08.29.05 at 3:15 am

{ 12 comments }

1 Martin 08.27.05 at 5:47 pm: “The quick take as best as I can make out â€“ if Able Danger singled out Atta as one of a small group of individuals who merited substantial further investigation, then the Pentagon has a problem. If Attaâ€™s name was one of hundreds or thousands, the rest of whom were mostly false positives, or if the network analysis didnâ€™t highlight Atta as someone who merited further investigation, then the Pentagonâ€™s decision to close down the program is far more easily defensible ex post.”

I find it interesting that after discussing data mining from, in effect, an engineering perspective focusing on its usefulness and practical limitations, in your close you switch to a bureacratic/political perspective (not, as Jerry Seinfeld would say, that there is anything wrong with that). Call me naive but the logic of your earlier discussion would seem to lead to a conclusion along the lines of:

“If Able Danger singled out Atta as one of a small group … the Pentagon has an opportunity for valuable progress against terror, even if it was sadly not taken advantage of. If Atta was one of many false positives, it looks like the Pentagon regrettably is stuck with the difficult, marginally effective, and often morally questionable intelligence methods of the past.

(I realize more or less all social phenomena can and should be described in both engineering/intrumentalist and bureaucratic/political terms. I guess its the unannounced switch from the middle to the end of the post that struck me.)
2 Cranky Observer 08.27.05 at 7:30 pm: > although in the unlikely event that the Able
> Danger people portrayed Rice as some class of a
> Manchurian candidate it would obviously be a
> serious problem).

I am no fan of conspiracy theories, and I think the big one (the Bush family, the Carlyle Group) here is right out in the open. But if you start out down a path using a statistically-based tool; define your model; enter your data, and get results; and then discard certain results because you “know” they can’t be true – well, I am not sure why you are spending my tax dollars to do that in the first place.

Cranky
3 Luc 08.27.05 at 9:23 pm: Being a bit paranoid about these things, I’d would argue against viewing data mining and associated technologies, as used by intelligence and law enforcement, as simple power tools with a knob with which you can turn up or down the false positives.

A large part of the job that these tools take over is risk assessment. Whether it is credit card transactions, creation/checking of a no fly list, or a credential check for a government job.

And you can’t depend on the few humans still involved to override/further investigate these automated assessments, either because there’s too many instances to check, as in the credit card transactions, or because they would take a big personal risk to go against the system, like in the no fly list or the credential check.

Something like this is noted in the referenced article by DeRosa, but without much clues on how to resolve it.

So my opinion would be that

The most that data mining can do (and should be expected to do) is sometimes to highlight interesting and non-obvious relationships that might otherwise have escaped peopleâ€™s attentions.

is a bit of a misunderestimation of what is being done with data mining and associated tools.
4 RedWolf 08.28.05 at 6:24 am: Scholars should comment on their areas of expertise; they should stay as far away from commenting on their expertiseless areas as they humanly can even if data mining tempts them mightily.
5 Tad Brennan 08.28.05 at 7:52 am: The quick take that Kevin Drum takes–

The moral of Able Danger wrt Atta has nothing to do with “data mining” at all.

The most credible accounts of how AD fingered Atta have to do with information from specific individuals, not publicly available data-bases. No amount of data-mining could have provided his name to Able Danger at that time: it was just old-fashioned human intelligence resources.

I don’t know if Drum is right about this, but it is worth keeping the possibility in mind: so far as the pros and cons of data-mining go, Atta may be a complete red herring.
6 Henry 08.28.05 at 9:44 am: luc – I used to be very worried about the implications of data mining and associated technologies for privacy, human rights etc until I actually started doing a bit of research on them. They simply aren’t very useful at the moment for security-related applications. The worrying part of them is their slapdash application (e.g. in some of the applications for passenger data that people fantasize about, and indeed have tried to implement sloppily). The much-ballyhooed Total Information Awareness initiative was, as best as I can tell, a bit of a joke – an attempt by DARPA to cobble together previously existing blue sky programs into something that could rustle up a bit of funding, which went horribly, horribly wrong for them when privacy advocates got hold of it. I do think there are issues here, but they’re long term issues.

Redwolf – sorry to say that you’re shit out of luck here – the politics surrounding data mining, data retention etc _are_ a major part of my academic expertise. Doesn’t mean I’m necessarily right, of course, but I do have some idea of that whereof I speak.

Martin – the framing of the argument (which perhaps doesn’t come across clearly enough) is that Laura Rozen doesn’t quite get the engineering aspects of this; accordingly, she’s somewhat misinterpreting the politics. I think that Kevin’s take on this is the most convincing to me.

Cranky – point taken, but what is interesting about this kind of exercise isn’t that you can connect a, to b, to c, which yer common or garden David Horowitz can do. It’s that sometimes non-obvious connections can emerge. But obviously, if you want to figure out whether these connections are meaningful, you need to (a) apply some sort of reasoning (as many connections will be either spurious or irrelevant) and (b) do further research/investigation of the traditional variety, as appropriate.
7 Cranky Observer 08.28.05 at 12:01 pm: Henry,
As I posted over at Drum’s, the thing to think about is this: British counterintelligence was consistently punked throughout the 30s, 40s, and 50s (and some fairly vital nuclear weapon secrets lost in the process) by consistently rejecting suspects/hypotheses because no matter what their preliminary investigations came up with everyone knew that people like Philby couldn’t be double agents. Too well-bred and all that. Except of course some of them were.

So the miners build a statistical analysis model (something I greatly distrust) to find bad guys, and out pops the names of a few Undersecretaries in critical positions. What do they do? Discard those names, of course, because “everyone knows” that “those people” couldn’t be suspects. Right.

And I would be reasonably certain that if we went back and ran various federal agencies’ and investment banks’ securities fraud models against the WorldCom data from 1995-2000 that Ebbers’ name would pop out as needing further investigation. For all we know it did at the time. Do you think anyone investigated as a result? As another commentor at Drum’s said, only pee-ons are targets for data mining.

But hey, I just do database, statistical modeling, and computer security work for a living. What do I know?

Cranky
8 Sebastian Holsclaw 08.29.05 at 3:20 am: “As another commentor at Drumâ€™s said, only pee-ons are targets for data mining.”

Even if that were totally true, that doesn’t make data-mining useless. It would be just another in a long line of useful law enforcement tools that doesn’t cut as harshly against the privileged. See prosecutors and the now Senator Kennedy. See jury trials and the now free OJ Simpson.
9 Henry 08.29.05 at 9:17 am: Cranky – the point that I’m trying to make is that evidence of connections in a network, by its very nature, is typically going to be pretty weak evidence of any underlying causal relationship of complicity in a plot etc. If you take this stuff as smoking gun evidence without backup, it can lead to paranoid thinking of the David Horowitz Discover the Network variety (or its leftwing equivalents). The evidence against Philby etc was, as I understand it, in principle rather stronger.
10 Peter 08.29.05 at 9:38 am: I thought that Kevin brought up an important issue on the AbleDangerHerring: that they (now) claim that they identified Atta, as Atta, months before he started using the Atta name.
And I think the other commenters are correct: we are ignoring our own generations Philbys because, well, we elected them and they wouldn’t do something like that would they?
Regretably, this is going to be 100% Grade AA political stew before it is all done. And we’ll probably end up proving Confessions of an Economic Hit Man correct in all details before this is finally laid to rest.
11 king-fu 08.29.05 at 2:01 pm: > pretty weak evidence of any underlying causal relationship of complicity in a plot

Then, why use the phrase “Manchurian Candidate” Henry? That seems like your own straw-man of causality, doesn’t it? What Condi did before 200 is well known, at least on the face of it. And, that is that she was smack dab in the middle of Afghanistan for years. I’m not at all shocked to see her linked with Chinese power centers.

It’s like someone recently pointed out: why would you [tin-foil-hatters] assume Osama’s working for the Bush family? Why would it not be the other way around?
12 Gary Farber 08.29.05 at 7:37 pm: For whatever little it’s worth, I had these observations about the present situation regarding Able Danger.

Comments on this entry are closed.

“Able Danger” and data mining

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags