A Word from the Nerds

by Kieran Healy on December 22, 2005

John “Hannibal” Stokes at Ars Technica has some interesting speculation on what the new technology behind the NSA wiretap abuse scandal might be. Because he knows a lot about computers, he’s also in a position to explain to the likes of Richard Posner one of the (several) things that’s wrong with computer-automated mass surveillance:

Just imagine, for a moment, that 0.1% of all the calls that go through this system score hits. Now let’s suppose the system processes 2 million calls a day. That’s still 2,000 calls a day that the feds will want to eavesdrop onâ€”a very high number, and still much higher than any courts could possibly oversee. Furthermore, only a miniscule fraction of the overall total of 2 million calls per day on only a few days of each month will contain any information of genuine interest to the feds…

… Here’s where the real problem with this scheme lies: the odds that a particular terrorist’s phone call will rate enough hits to sound an alarm are not primarily dependent on factors that we have control over, like the amount of processing power and brain power that we throw at the task, but on factors that we have no control over, like how good that terrorist is at hiding the content of his communication from the feds. …

As the TSA, with its strip-searching of people’s elderly grandparents, abundantly proves every holiday season, blunt instruments and scorched earth tactics are of dubious value in catching genuine bad actors. … All you need to beat such surveillance tools is patience and know-how. This is true for face recognition, it’s true for biometrics, it’s true for RFID, and it’s true for every other high-volume automated technique for catching bad guys. …

Targeted human intelligence has always been and will always be the best way to sort the sharks from the guppies … Government money invested in much less intrusive and much less defense contractor-friendly programs like training more Arabists and developing more “human assets” in the field will be orders of magnitude more effective than mass surveillance could ever be. … any engineer or computer scientist worth his or her salt will tell you that an intelligent, targeted, low-tech approach beats a brute-force high-tech approach every time.

There is no high-tech substitute for human intelligence gathering. … In the end, brute force security techniques are not only corrosive to democratic values but they’re also bad for national security. They waste massive resources that could be spent more effectively elsewhere, and they give governments and countries a false sense of security that a savvy enemy can exploit to devastating effect.

In short: don’t be seduced by technology. Computers are extremely powerful tools, but this isn’t the movies. Think of the last time you had to deal with the confluence of state bureaucracy and computer-based record-keeping — at the DMV, say, or at tax time, or at the local University’s Registrar’s office. Did it strike you as a ruthlessly efficient, accurate, and purpose-driven system?

{ 32 comments }

1 M. Gordon 12.22.05 at 2:27 pm: I don’t know if I entirely agree with the premise of this comment (although I agree with the sentiment.) A general computer-scientist approach to these problems (which is what is being brought to bear here) tends to say things like, “Well, I can trivially game the system by the following algorithms…” Computer scientists then have a tendency to think that this has been reduced to an already solved problem, and is therefore falliable, and useless.

But, in the real world, you have to do more than just download a rootkit to circumvent physical security measures, automated or not. “All you need to beat such surveillance tools is patience and know-how.” Well, yes, you need patience, which means time, and you need to throw people at the problem, and you need people who are dispensible to test for weaknesses, and you need to acquire the know-how, all of which require resources and money. So, yes, with enough time and money, you can circumvent the security. This is different from other security systems how?
2 soru 12.22.05 at 2:31 pm: All you need to beat such surveillance tools is patience and know-how.

Attributes possessed by a rather large proportion of Hollywood ninjas, but a rather small proportion of potential terrorists.

soru
3 Catherine Liu 12.22.05 at 2:33 pm: I think that we are trying to differentiate between “mass surveillance” and “human intelligence” and their different capacities to target accurately potential threats to US security.

The first, mass surveillance also has a proven chilling psychological effect on political dissent — so that may in the end, be the political goal of those who would use the blunt automated method to gather intelligence, which is only a secondary goal.
4 Sven 12.22.05 at 2:35 pm: What are the odds that there’s a big, lucrative contract for a crony company involved in this? That seems to be the pattern. The Lincoln Group was handed $100 million plus for what amounted to little more than make-work.
5 Cranky Observer 12.22.05 at 2:50 pm: Here’s a different take on the technology. I tend to agree with this author that the goal is traffic analysis and social network recognition, not voice or word scanning per se.Cranky
6 eudoxis 12.22.05 at 3:17 pm: Specificity/sensitivity issues are universal and perennial.
I’m not getting a sense of anything new here or elsewhere.
7 Barry 12.22.05 at 3:23 pm: “Well, yes, you need patience, which means time, and you need to throw people at the problem, and you need people who are dispensible to test for weaknesses, and you need to acquire the know-how, all of which require resources and money. So, yes, with enough time and money, you can circumvent the security. This is different from other security systems how?”

Posted by M. Gordon Â·

The way that security systems differ from each other are, quite obviously, (1) how much resources need to be used, (2) at what risk, and (3) what are the side effects of the system on everybody else.
8 M. Gordon 12.22.05 at 3:37 pm: The way that security systems differ from each other are, quite obviously, (1) how much resources need to be used, (2) at what risk, and (3) what are the side effects of the system on everybody else.

Uh, yes. And the author does not quantify either (1) or (2) (except figuratively, in a way that got Posner roundly trounced in the item a few slots up from this one), and only tangentially addresses (3). Traditional security measures, including the ones listed by the author (human intel) have impacts on all of these as well, but the author does not assess how these differ, and implicitly suggests that the problems he cites are unique to the types of intel gathering that is under discussion, and absent from the traditional ones.
9 lambert strether 12.22.05 at 3:52 pm: Cranky observer–

Yes. It’s a means to an end, though we’ll discuss social networking (more precisely, modelling social networking) in a subsequent post to the network architecture of treason.

Note that the system scaled and scoped can be build with $1000 off-the-shelf rack servers. There’s no reason to think it’s Echelon; rather, given the malAdminsitration’s propensity to bypass existing institutional structures, it’s probably something entirely new.
10 jw 12.22.05 at 4:03 pm: As the best current speech recognition systems are 98% accurate when trained to a specific voice, it’s clear that any voice call monitoring system will experience an unacceptably large number of false positives. The rate would be too high even if was an order of magnitude smaller.

If we generously assume that 1 call in a million is about a terrorist incident, that will be 1 call out of 20,000 flagged by the system. The chances of fidning the right call out of 20,000 aren’t high, especially as you’ve trained your security personnel with thousands of false positives to ignore your results. Then there are the inevitable costs in civil liberties.

You might be able to construct social networks with this technology, but it’s a very costly (in terms of deployment, maintenance, and civil liberties) and ineffective system for identifying individual terrorists.
11 lambert strether 12.22.05 at 4:06 pm: jw writes: tâ€™s clear that any voice call monitoring system will experience an unacceptably large number of false positives.

Interesting. And that’s why we posit that the system Bush has built is not about voice, but IP addresses. The “wiretap” talking point is a useful distraction because it suggests voice, but as the Patriot Act is drafted, the term can apply to Internet communication as well.
12 Andrew 12.22.05 at 4:38 pm: These are examples of motivated reasoning at its finest.

If I understand the warrantless wiretapping program (WWP) correctly, it combines elements of human intelligence and communications intelligence. The CIA or other parties gather names, telephone numbers, etc. from overseas collections of computers or documents. Those names and numbers that are domestic to the United States are then fed into the WWP, and the outgoing int’l communications are monitored.

So the selection is not random, and the size of the selection is unlikely to be so large as the above posts have indicated.

With respect to the ability of a computer to narrow down that selection to actual terrorists, this will likely involve the use of data gathered from human beings, e.g. information from a recent al Qaeda capture about suggested “code phrasings” to be used in communications. This extra data will further shrink the probability of computer error.

In short, it is far from clear that the use of the WWP—moral and legal implications aside—would not be a useful component of counter-terrorism.

Those eager to condemn it as such, in the absence of anything approaching sufficient information to do so, are allowing their (rightful) condemnation of the program to guide their judgment of its efficacy.

Or perhaps they fear that if the program is indeed efficacious, it is less pernicious than they believe?
13 M. Gordon 12.22.05 at 4:43 pm: As the best current speech recognition systems are 98% accurate when trained to a specific voice

Actually, I wouldn’t be so quick. The NSA has some scary smart people working there. I recall reading (but cannot find a reference now) that some years ago the NSA released an open-source software speech recognition engine that was way ahead of what other people in the private sector were doing at the time, and it really gave people pause to wonder, if they were releasing this software to the public, what capabilities did they have that they weren’t talking about. That may just be an apocryphal story lodged in the back of my crufty memories, though, so, you know, don’t take it too seriously.
14 Russell L. Carter 12.22.05 at 5:32 pm: “Traditional security measures, including the ones listed by the author (human intel) have impacts on all of these as well, but the author does not assess how these differ, and implicitly suggests that the problems he cites are unique to the types of intel gathering that is under discussion, and absent from the traditional ones.”

The problem here is not that the technology is fallible, at whatever rate; it’s that the technology itself is being substituted for the arms length judgement provided by e.g. the FISA court. Implied in the above discussion is that the amount of approved “taps” will likely be much, and maybe vastly larger, and there is absolutely no oversight on our new technological judges. And no oversight on the programmers/shift supervisors who drive the technology. The naive view expressed by Posner is that there is some way to impartially direct this stuff. But there isn’t. There’s always going to be easter eggs and the woman in the red dress.
15 a 12.22.05 at 5:36 pm: Bletchley Park.
16 Semanticleo 12.22.05 at 5:58 pm: Eschelon, TIA, whatever issues arise over technology, can anyone tell a layman like myself
what rapid-fire technology whizbangs would create
a FISA real-time burden, when the Feds can report to
the court 72 hours after the fact?
17 nick s 12.22.05 at 6:08 pm: Judge Posner seems to think that it would be okay if FedBots searched his house, as long as they tidied up afterwards. (Given that the makers of Roomba are also military contractors, this isn’t too far-fetched.)

I think we have a decent sense of the technology being used by the NSA: multi-terabyte solid-state caches combined with Digital Signal Processing. Sigint vacuum.
18 Martin Bento 12.22.05 at 6:22 pm: I agree that it is a mistake to assume that the NSA has nothing better than the current crop of consumer goods. Their budget is huge and they vacuum a huge swathe of the top CS and math talent, even moreso lately as dot com Xanadu has crumbled.

In any case, an investigation like this intrinsically has high redundancy. If a potential terrorist is foolist enough to use words like “anthrax” and “White House” in conversation about plans involving such, he and his compatriots are likely to do so repeatedly. Such redundancy enables a lot of error correction (I’m assuming the NSA would be using fuzzy logic, and therefore prioritizing matches by probability. If I’m smart enough to think of that, the NSA is).

However, the fact is any terrorists foolish enough to do that are not terrorists we have to worry much about. Who will do that are people with unpopular political opinions, particularly of the liberal variety, as liberals tend to want to believe in the system and are emotionally resistant to “conspiracy theories”. If you want to know the actual purpose of this system, look at what such a system could realistically achieve.

As for traffic analysis, is a warrant even required for that? I would be surprised if so, and it’s possible to imagine FISA being reluctant to hand one over.

That said, I do think Lambert at Corrente is right. This is aimed primarily at the Internet, not voice traffic. Now is the time to stop trusting the system and recognized that the Internet needs enforcable privacy.
19 jw 12.22.05 at 6:53 pm: If the target is the Internet, why is this news? The NSA has been watching the Internet outside the US for years, while the FBI has been watching it inside the US with Carnivore and its derivatives. Every ISP and every major corporate network deploys network sniffers to watch their traffic. Internet traffic hasn’t been private for over a decade.

While the NSA is the largest employer of mathematicians, I doubt their technological lead in speech recognition is significant. Sure, they can do better than any one university, but they can’t do better than all them cooperating together. Historically, intelligence has had a technical lead only in fields where few engineers were working and that lead disappears rapidly when that fact changes.

The history of cryptography shows how quickly their technological lead can evaporate; after WW2, intelligence services had a monopoly on modern cryptology. Yet it didn’t take much effort for IBM to invent much of it themselves in their Lucifer cipher, and in the 1970s Diffie/Hellman, with no support compared to the NSA, helped to bootstrap the academic field of cryptography and re-invented public key cryptography only about a decade after British intelligence had.
20 jet 12.23.05 at 12:23 am: It would be ridiculously easy to circumvent any of this auto-bot security.

The sword has always evolved faster than the shield, and this techno-crap is only useful after you’ve identified a target. Sifting the masses is just a gross denial of civil rights. What’s going on here is just some profiling of every person tapped and anyone serious about not getting caught will have no problem hiding their proxies behind non-descript ip’s that are normal everyday traffic, and hide their encrypted communications inside core files, executables, media, whatever.

But as long as this injustice is reserved for communications where at least one party is not a US citizen, I’m not sure I’m that bothered. Of course I’ll be changing my mind when another Clinton is in office and not only uses the FBI and IRS against Republicans, but now will be using the NSA.
21 Lurker 12.23.05 at 3:00 am: I don’t recall clicking this link on CT.

Perhaps just one more scaremonger. But if even barely true, it is really sad.

Straighforward and announced govt. monitoring and tracking is a pain in the proverbial, but a fair number of people across the world, something like the folks living in that tiny bit of land mass between Amman and Tokyo, are used to it, and have evolved systems and habits to deal with it. But privatised and outsourced monitoring is just taking all this ‘Capitalism is God’ to absurd heights.
22 Martin Bento 12.23.05 at 3:52 am: jw, good point about the status of Internet monitoring now. Perhaps Lambert is off-base. What I suspect, though, is that we’re talking about correlating all that Internet data with other info that might normally need a warrant – phone conversations, economic transactions, etc. That’s why we’re hearing both about new technology (not needed for officially extending Echelon) and lack of warrants (not needed for sniffing and analysing Internet traffic). The notion that Diffie and Hellman caught up to British intelligence after a decade doesn’t reassure me much – a decade is a long time in computer science, and most areas of it progress faster than cryptography.

Isn’t it interesting that jet comes out and admits that he doesn’t mind this under a Republican, but will under a Democrat. A clear advocacy of rule of Republican rather than rule of law.
23 laocoon 12.23.05 at 8:16 am: Folks,

The folks who do the work being discussed are quite aware of all the issues raised by Ars Technica, as well as many more, at a much higher level of sophistication than anyone has discussed on that blog.

The NSA has the finest statisticians in the world, and they know quite well that Bayes’ Theorem shows that P[A|B] = .999 and P[-A|-B] = 0.999 are perfectly compatible with P[B|A] = 0.001, if P[B] = 10E-6.

And the trivial math above is just the simplest possible example (and the clear statement of the tediously verbose dolphins/fish example). Of course, this is also why face-recognition is a failure, where A=”face classified as belonging to terrorist” and B=”face’s owner really is a terrorist”.

Similar errors abound:
A=”Is a muslim”
B=”Is a terrorist” (OK, P[B|A] = 0.8, but you get the point)

And so on.

In short, just because most media rabble-rousers, admistration apologists, blog writers, and blog readers are not mathematically sophisticated is no reason to assume that professional designers of intelligence algorithms are similarly unsophisticated.
24 Tom T. 12.23.05 at 8:20 am: Kieran’s point is an excellent one, although some of the examples don’t fit. TSA’s strip-searching of grandmothers doesn’t belong in this post because it has nothing to do with computer automation; indeed, it reflects a rejection of high-tech profiling and recognition systems for political reasons (worthy ones, in my view) having to do with fears of discrimination. Also, what makes going to the DMV typically such a hellish experience is not its computerized aspects but rather the (nominally) human clerks who hate you. Dealing with the DMV online or by mail is much less stressful than face-to-face.

More generally, it’s worth noting that one should be skeptical of human intelligence as well. It’s also going to produce a lot of false positives, wasted motion, and bad policy initiatives (see, e.g., Ahmed Chalabi).
25 Russell L. Carter 12.23.05 at 9:51 am: “In short, just because most media rabble-rousers, admistration apologists, blog writers, and blog readers are not mathematically sophisticated is no reason to assume that professional designers of intelligence algorithms are similarly unsophisticated.”

“Hero in error” Chalabi has a PhD in Mathematics from the UoC.

tom t., what does that have to do with computer automation?
26 fred lapides 12.23.05 at 10:57 am: 1. If it is a matter onloy of wiretaps, then the FBI does that–with, we are told, court orders. It goes beyond that.
2. On phone taps: the mafia types long ago knew enough not to trust phone calls because they were likely to be tapped. Wouldn’t terrorists too?
3. If all this tech stuff is worthless, why then the huge uproar when it was discovered that Echelon system in place world-wide. See search engine for Echelon
4. Why was TISA put in place if NSA tech stuff of little usefulness. See: Nixon
5. If you believe NSA and tech stuff unable to generate useful stuff, see http://www.nsa.gov/publications/publi00044.cfm
27 laocoon 12.23.05 at 11:00 am: R.L.C.

A PhD in Mathematics is just the start of the required background. I used to give out PhD’s to folks, so I do know a little about what it takes to get one. Another essential element is familiarity with classified techniques of analysis, classified counter-techniques of thwarting analysis, classified counter-counter-techniques … and a lot of empirical experience doing classified analysis of real (aka classified) data. Abstract analysis is nice, but nothing can match seeing the real data.

So, again, this is the kind of discussion one gives to new analysts on their first few days on the job – and they don’t get down to real learning until much later.

We love to be democratic and give everyone a fair hearing, which is wonderful. It’s great that we don’t blindly accept whatever so-called experts claim. But sometimes there are real experts who truly do have amazing expertise. And in this case, they can’t speak, or even identify themselves.
28 Martin Bento 12.23.05 at 2:16 pm: Laocoon, I agree with your defense of the NSA”S competence. They are the elite nerds of the intelligence world. Depending on what you think their objectives are, what you think they would be willing to do to achieve those objectives, how they or their can be used by others, in this case, higher -ups in the Administration, and your value-laden judgements of all these things, such competence may not be reassuring. The Republican party has quite publicly equated strong opposition to them with support for terrorism. I think it is important to take such rhetoric seriously. I also think the human lust for power is such that it will always be abused in the long run (this last contention puts me at odds with many other liberals) and therefore the only solution is decentralization of power.
29 jw 12.23.05 at 3:09 pm: I haven’t argued that the NSA isn’t competent; after all, some of my best students have gone to work there. However, it’s still just one institution and it has limits on what it can achieve. I’m sure they’re a few years ahead in certain specific areas relevant to their mission. However, even if they are a few years ahead in speech recognition, it doesn’t change my analysis above as we aren’t advancing fast enough to change that 98% above to 98.5% in a few years, much less to the 99.999% required to be useful.

The false positive problem is a major one in all areas of computer security–biometrics, intrusion detection, and code analysis for some examples–and the solution is almost certainly going to be using the techniques in certain limited ways, not in coming up with solutions that are 99.999% accurate. Even humans aren’t that accurate at voice recognition under the conditions given above. For example, an effective use of face recognition is to verify that the person in front of a scanner is the person who he claims to be, a 1:1 matching problem instead of the N:M matching problem of identifying faces in a crowd.
30 jw 12.23.05 at 3:16 pm: As a postscript, let me point out that I haven’t claimed that the NSA is using any particular technique. I simply analyzed the system mentioned in the article above. I agree that the actual system used may be something completely different, because we don’t have enough information to know what they’re doing.

However, I do want to point out that the fact that NSA analysts are smart doesn’t really support that conclusion. After all, their goals are established by the administration. I know a number of brilliant physicists who worked on SDI in the 1980s even though they knew it wouldn’t work because it funded them to do interesting physics.
31 Russell L. Carter 12.23.05 at 10:43 pm: “And in this case, they canâ€™t speak, or even identify themselves.”

Why that would be the point, precisely.

Think closely about the trust system this requires. A whole separate society, partitioned off from the banal outer world, cloistered, arrogant (as laocoon comes off lecturing a maths/engineering professional) and with internal self correcting mechanisms necessarily opaque to us, the hoi polloi.

Why laocoon would have, even in his eminence, magnamaniously defered to Teller and von Neumann, true? We should have trusted them, and not asked questions.

And laocoon, I am highly curious how the NSA prevents easter eggs. Or does it just punish them? How does secrecy interact with each management mode?

Meta alert: this even touches on the current raging notion of ID vs evolution: the ID crowd claims (Fuller, especially) that the scientific cabal prescribes a single “truth”, and the hoi polloi must accept it.

I think this is rancid bs. The internalized communitized socialized what have you notion of truth in the scientific community is out in the literature. Increasingly it is on the arXiv or in citeseer, with near instantaneous access to pdfs or sometimes just ps’s of the original papers.

Here we have the NSA prescribing a single “truth”, and we have the likes of the eminent laocoon defending the practice because we dirty ordinary citizens don’t have the mental acumen to correctly judge the necessarily opaque processes which protect us from the Iranian and NK nukes in a suitcase doomsday scenario. Oh wait…
32 golambek 12.26.05 at 9:45 pm: Laocoon,

I follow your claim that NSA analysts are aware of the problems that are raised on this thread, particularly the problem of false positives. I share your conviction that they are aware of them.

I don’t see, however, where you’ve said anything about how it is possible to grapple with these problems, or what features a system that does so is likely to have.

Absent that, you haven’t done much to advance the discussion.

Comments on this entry are closed.

A Word from the Nerds

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags