Remember all the concerns about GMail reading people’s emails with the goal of displaying targeted ads? I was among those expressing reservations back when the service was first introduced. I continue to believe that it is important to be generally conscious about how much of our email and other activities are stored and potentially analyzed by Google and other service providers. Nonetheless, it’s also interesting to pause on occasion to see the level of sophistication – or lack thereof – that some of these services have reached nowadays.
Sometimes I am surprised by how well the ads on the sidebar match the content of my messages. For example, from very little text, GMail seems to be able to tell if a conversation is conducted in another language and serves up ads consistent with the language of the correspondance (here I’m referring to some experiences with Hungarian). It may be that it’s tracking the route the email took. None of the email address domains end in .hu so no clues there.
Today, however, I was reminded that there is still considerable room for improvement in the system. I am in the midst of corresponding with some friends about an evening outing consisting of drinks, dinner and possibly dancing. There is no information in the messages about the location of all this (even at the city-level) so it’s hard for the ads to be targeted in that way. Our email addresses either end in gmail.com or educational institutions scattered across the country so even if GMail analyzed that information, it wouldn’t help in this case. We also haven’t mentioned any restaurant names to provide clues.
There is one piece of specific information that has come up, however: “I’m flexible (except the usual Thai food allergy problem).”
Given this note, it was curious to see a link to “Thai Restaurant Iowa”. The word “allergy” is right next to “Thai food” in the above sentence. So what are the chances that information about Thaid food restaurants is going to be of interest?
{ 23 comments }
Aaron_M 10.13.06 at 8:45 am
I am already thinking about Thai food…I guess the ad is working just fine. Although I think I would save money by going directly to Thailand instead of Iowa.
Doormat 10.13.06 at 9:30 am
My experiences have been pretty similar to Eszter’s in this regard. Sometimes it’s uncanny; like when a friend was admitted to hospital with a heart problem, and every add was for heart medicine: amazing given most of the emails were about other stuff, occasionally mentioning this mutual friend’s medical problems.
Sometimes it’s very general though. I’m a mathematician, so lots of my emails are to colleagues about very technical subjects: Googlemail generally realises we’re talking about maths, but almost always points me to undergrad websites, or, a favourite of mine, to the Delhi Times, which seems to run a weekly “finds in mathematics” column!
Does anyone have any idea how Google carries about such analysis of email, or is it a Google trade secret?
tps12 10.13.06 at 9:51 am
How can you be allergic to a cuisine? My ex’s mom claims to be allergic to “curry,” but really she just doesn’t like Indian food.
sanbikinoraion 10.13.06 at 10:00 am
GMail is generally good but I think that there’s a problem with its text service that wilfully disregards such phrases as “not interested in cialis” and “i hate ryanair” and serves you up adverts related to the thing you’re dissing.
bi 10.13.06 at 10:16 am
How does it tell it’s Hungarian? I guess there aren’t that many languages whose orthographies contain letters like ő or ű anyway…
Scott Martens 10.13.06 at 10:17 am
For example, from very little text, GMail seems to be able to tell if a conversation is conducted in another language and serves up ads consistent with the language of the correspondance (here I’m referring to some experiences with Hungarian).
That’s really easy – easy enough it might be in the undergrad texts now. But for the rest: look, I’ll bet Google is just using basic statistics. Some kind of word and n-gram counting, maybe with variable length segments of text or a collocation dictionary. There are probably a few bells and whistles added on, but I’ll bet at the core, it doesn’t do anything more complicated than what Salton outlined in ’89 in Automatic text processing.
It’s not like some program reads your messages and actually understands them, it’s just that understanding is often incredibly easy to fake with a bit of number crunching.
J. Ellenberg 10.13.06 at 10:41 am
You’re failing to consider the possibility that the ad is for a hypoallergenic Thai restaurant.
soubzriquet 10.13.06 at 11:22 am
Scott is probably correct in 6. The thing to remember is, they have a *very* big corpus to work with.
moriarty 10.13.06 at 11:38 am
I don’t understand the question. Even if in this case the reference to Thai restauarants was negative, in most cases it will be positive. That’s the way it’s supposed to work. They wouldn’t expect every ad placement to be spot on. You don’t need to win ’em all.
Ben 10.13.06 at 11:44 am
I have my students email essays to my gmail address. Often there’s hardly any text (‘here’s my essay’), but I still get ads for online essay banks…
etat 10.13.06 at 12:20 pm
Is there an unstated theme here about having Google ‘listen in’ on your email conversation? You know how it is; you’re having a conversation on your mobile phone about something, when a passerby interrupts to give you some relevant bit of information. We’re all thrilled by those chance encounters, right? So we’re even happier to expect it of our friend Google.
Brock 10.13.06 at 1:43 pm
Interesting trick that was pointed out me: if you want to kill the ads in an entire Gmail thread, all you have to do is use the word “funeral” in a message.
Brad 10.13.06 at 2:06 pm
Email, webpages, and the like specifiy a language in the metadata of the message so that the proper character set encoding may be used.
The location data comes from the internet address of the computer from which you’re requesting the page at the moment. (although, for this same reason, Google has a very good idea of where and when you live, work and travel)
yoyo 10.13.06 at 5:30 pm
Well, speaking of languages and thai, gmail isn’t superb with the language identification. Every day i get 3-4 emails in some language that appears to be thai script. THis despite the fact i religiously mark them as spam, and write or read emails in anything other than the very occasional french.
Eszter 10.14.06 at 2:33 pm
Sorry I’m late to this discussion, I was on the road. First, my messages in Hungarian rarely include the accented letters. They’d be way too tedious to type and a solid level of Hungarian knowledge makes them unnecessary. That is, with the occasional amusing exception, you can guess what the word is supposed to be without the correct letters.
Regarding “Thai food allergy”, you’re right tps12 that it’s not all Thai food. The problem is that I haven’t been able to figure what it is about Thai food that I am extremely allergic to. I can smell it and certainly taste it (although avoid it now), but I don’t know what it is. It is not peanuts and it is not coconuts (as I can eat both on its own), but it is something sweet so it may be a sauce that has one of those.
Incidentally, I really don’t appreciate it when people suggest that I call it an allergy even though it’s simply a dislike. I assure you that I have very obvious and visible reactions (not to mention the pain associated with it), which I think qualifies as an allergy. I used to like Thai food, this all started happening a few years ago.
Moriarty – my point was that given the allergy, there’s just no way I’m going to be interested in clicking on a link advertising a Thai restaurant. The word “allergy” was right next to the words “Thai food” so the link was there in the message.
Lyrebird 10.14.06 at 3:30 pm
Language identification (even for all-roman-letter languages) is one of the easiest problems to solve w/the techniques mentioned in reply #6. A smidgen of Italian versus Spanish can be accurately pegged; Hungarian is that much easier because of cool consonant sequences like zs and stuff.
Knowing the content of any text proposition is MUCH HARDER than detecting its topic, thus selling of Thai restaurants to someone who just said she can’t eat Thai food. (Like that wonderful Far Side cartoon about what dogs hear: “blah blah blah blah Ginger blah blah”…)
My fave (ugh) is when gmail gives me garbage dating advice books when my classmate has just written to me about her meeting with a Faculty Senate rep, should she call him, etc.
Cheers!
stuart 10.14.06 at 9:14 pm
I think you might be overestimating what Google (and other ad providers) are doing with this context sensitive ad stuff. The don’t understand the content of the page/email, and they aren’t attempting to process it, they are essentially doing a cross reference search between the content and their list of advertising terms that people are paying to match ads to and providing you with (presumably) the highest $ value match – although it might also weight how many matches, and more complex terms as higher, so that ‘Thai’ might by default give you Thai holiday, but ‘Thai food’ would give the restaraunt even if the holiday people would pay a little more, because its a better match (and hence more likely to click through).
Now if some health site decided to buy the adword ‘Thai food allergy’ (unlikely they would get that specific), or maybe just ‘food allergy’, then you might get their ad instead if you wrote the same thing again.
Jacob Christensen 10.15.06 at 10:06 am
Google’s ads can often be incredibly amusing. I noted that the confirmation of a ticket reservation for the Opera in Frankfurt led Google to believe that I wanted to emigrate to the U.S.
(The opera in question was Smetana’s The Bartered Bride, in case you wonder).
Inspired by an earlier post here on CT, I bought Sheri Berman’s Primacy of Politics. When Amazon sent its shipping message to me, Google took that as a cue to offer investments in Slovakian property.
Huh?
Raven 10.15.06 at 7:27 pm
No doubt the word-pair “Thai food” was sufficient to bring up ads related to “Thai food” (as in Thai restaurants). The context was not considered, and of course in the process there was no sentience that could possibly have considered it, since this only amounted to a search/match algorithm.
You could have written, “I hate Thai food! I loathe and despise Thai food! I won’t go within 100 miles of a city that has Thai food anywhere within its borders! I’ll go on a rampage of mass murder if anyone ever again mentions Thai food to me!”
And you know that Gmail would, accordingly, attach some sort of ad for Thai food.
Raven 10.16.06 at 5:12 am
I expect that Google Groups (the Usenet newsgroup archive) uses the same algorithm as Gmail to attach ads to posts and threads. I also wouldn’t be surprised to learn that the national-security apparatus does its initial quick scans of Internet traffic using tools as rough and clumsy, though with different search targets than advertiser keywords.
With lots of time to kill, the president of our club suggested “adopting” a stretch of road to clean. Other PR ideas had bombed, but he said getting our name on the roadside sign would keep our recognition levels shooting higher every week, as people would drive by it over and over. “Throw out better ideas if you’ve got ’em.” We didn’t, so we spent rounds bagging trash each week, from cans and paper to road-kill. The president praised our efforts every time he gave out assignments, but eventually we compared notes and noticed that none of us ever had him sharing our rounds. After that, our club participation dropped. The bomb this time was a time bomb: the time it took to realize he’d volunteered us, not himself.
Oh… hi, guys! How ya doin’? My, what pretty, shiny badges you have!
Eszter 10.16.06 at 9:12 am
Raven, thanks for putting CT on the “watch closely” list (uhm, not that it wasn’t on it already, I suspect:).
Regarding more general comments about my observation, I realize that the algorithms are not paying close attention to context, thus the resulting ad I mentioned. My point was to note that it is a direction that seems would be fruitful to follow. It seems like so many people are working at these companies (like Google, but others as well), one would imagine some of these products may improve over time. I suspect they will, they just haven’t gotten too far as of yet.
Peter Clay 10.16.06 at 9:37 am
I’m trying to coin the term “data sharecropping” to refer to the practice of keeping your vital personal data (such as email) on a system you don’t own and have no rights over, such as Gmail.
fyreflye 10.16.06 at 5:55 pm
If you’re seriously concerned about GMail ads switch to Firefox and download its Adblock extension; it will kill ads on Google Groups, GMail, and just about anyplace online you don’t want to see them. Unfortunately I can’t guarantee that your mail isn’t still being read and that you just can’t see the resulting ads.
Comments on this entry are closed.