All men (nearly)

by Chris Bertram on February 19, 2004

Inspired by “Michael Brooke’s post”: on “The Gender Genie”: , a site that analyses text and guesses whether the author is male or female, I’ve just run samples of the Crooked Timber team’s writings though the test. It turns out that Ted is probably a woman and that all the rest of us (including Eszter and Maria) are men! Harry, whom I had down as a caring-sharing type, turns out to have gallons of testosterone coursing through his sentences. Who’d uv thunk it?



Motoko Kusanagi 02.19.04 at 10:22 am

I had a hairdresser once who claimed very seriously that he could tell whether the last person who cut someone’s hair was male or female. In my case it was a man and he guessed wrong, but simply concluded: “Then it was a queer!” An easy escape route, in the case of coiffures. (Is Ted…?)

By the way, I checked the opening paragraph of The Lover by Marguerite Duras: male.


Ophelia Benson 02.19.04 at 3:45 pm

Ha. I’m a man. By a factor of more than three. Female score 536, male 1727.

Mind you, I deliberately chose a particularly, shall we say, acidulous post to try. But then my posts tend to run the gamut all the way from mildly acidulous to very acidulous, so I’m not sure it matters much.

(I know, that’s not how the Gender Genie is testing.)


harry 02.19.04 at 4:28 pm

Because I write in very different ways for different audiences I ran a bunch of things (stung by Chris’s observation). My blog posts are v. male (not as male as Ophelia’s!). My academic papers (all recent, or unpublished — I ran long excerpts from 7) are all male, but much less so than the blog posts, and a couple barely at all. 2 are co-written with a woman, and they came out more male than the 1 co-written with a man. My journalism, like my blogs, is v. male.

Then I ran something from my favourite female solo-blogger (Laura at Apt 11D) and she came out as male as my blog posts, so I no longer trust the algorithm


Maria 02.19.04 at 5:03 pm

It could also be that the style of writing in a political type blog is quite ‘male’, i.e. didactic, analytical, less about ‘me’ and more about ‘them’. (Though I don’t think it’s an accident that there are so many more male bloggers than female. There’s something about holding forth and expecting an audience that is more typically a male trait…)

But the algorithm they use seems to put the cart before the horse, much the same as those gender tests we were doing last year (can’t remember the link – but mine said I was not only a man, but probably autistic too.)

Instead of starting with the assumption that womens’ writing is touchy-feely and all about me, and creating an algorithm on that basis, would it not have made more sense to feed an AI programme with lots of writing, tell it which gender each is written by, and let the thing learn? That’s how the most accurate spam filters are being developed right now – giving the programme with lots of ‘good email’ so it can determine the characteristics of same and distinguish it from spam.

Seems to me all that gender genie can do is regurgitate its authors’ dodgy premises.

But I’m probably just a bit ashamed that my first reaction was ‘great, I don’t write like a girl!’.


dsquared 02.19.04 at 5:26 pm

It doesn’t appear to pick up on the fact that I am three separate people, one of whom is female …


Barry 02.19.04 at 6:29 pm

dsquared, then shouldn’t you change your name to ‘dsquared*x’, or something like that?


Ophelia Benson 02.19.04 at 7:50 pm

I thought I’d posted again, but I must have forgotten to hit post after preview – so girly of me.

I said that I tried two or three more of my blog posts (haven’t tried any more formal writing, I should – only I’m supposed to write 3000 words in the next 24 hours and I only have a bit over 700 so far, so what the hell am I doing this for, a break, that’s what!), and got less out of proportion numbers including one just barely female.

“(Though I don’t think it’s an accident that there are so many more male bloggers than female. There’s something about holding forth and expecting an audience that is more typically a male trait…)”

Yeah, and we need to change that! Women need to get more noisy and bossy and opinionated.

That’s only half joke.

And I often wonder. Maybe not, maybe it’s just that we naturally notice what is said to and about our precious Selves more than we do what’s said to and about other people – but all the same I often wonder if I don’t get more, shall we say, overheated reactions, because I’m a noisy woman. And if so – oh well, you know the rest.

2300 words. I feel sick.


John Quiggin 02.19.04 at 8:32 pm

If you look at the (1) algorithm, you’ll find that maleness is largely determined by the (2) frequency with which the (3) word “the”(4) is used.


Ophelia Benson 02.19.04 at 8:44 pm

I did notice that very thing. (Ooh, ‘that’, how macho.)

Is that based on anything, I wonder? Or just the supposition that women are more indefinite than men?

The. The supposition. The.


PF 02.19.04 at 9:33 pm

I still think the stats are worth a read.

Why is it so good at guessing female non-fiction? (66648 correct answers versus 9400 incorrect of entries longer than 500 words.)

Why is it so bad at female blogs and fiction? (2179 correct versus 3872 incorrect of blog entries longer than 500 words and 3695 correct versus 5200 incorrect of non-fiction entries longer than 500 words; both significantly worse than random guessing would be.)

Why is the total number of entries in the female non-fiction category, where the genie is best, an order of magnitude greater than the female entries in either the blog category or the fiction category? If we reduced that category to the size of the others, the 64.46% rate of success would certainly go down, and that is the only category where the genie is better than 50% with women. (There are more male non-fiction entries longer than 500 words, and the genie is more successful with them, but not as many as with the women.)

In the “Old Totals”, submissions between August 15 and September 13, the Genie is just about as good as a random guesser would be, assuming there were equal totals of men and women.

And why does everyone always seem so happy when the genie gets them wrong? I think the accent tends to be laid on “I am not the gender I am” rather than “this computer program is crap because counting definite articles won’t get you where you want to go.”

And then all the stats are from self-reporting, so who knows, really.


Motoko Kusanagi 02.20.04 at 10:17 am

I found a use for this thing! I’m writing an article in an internet cafe where the computers have only one word processor, wordpad, which, as far as I know, has no word count. So now, if I want to check how much I’ve written, I feed it to the Genie. Oddly enough, when you enter a Dutch text, you get scores like male: 104, female 0… The only word the Genie recognizes is “is”, which is typically male it seems.


TomD 02.20.04 at 2:05 pm

I think the way the creators set the thing up was exactly to start with a large corpus of writing from both sexes and analyze the word frequencies. Then they use the deduced word frequencies to analyze the input.

They didn’t choose the ‘female’ words *because* they felt ‘touchy-feely-togethery’, they chose them empirically and they just turned out to be ‘touchy-feely-togethery’.

At least I hope that’s the case.

The less-than-stellar success rate may be due to the fact that the people who are entering text at the website are a very unrepresentative sample of the general population of writers. In other words, only freaks and geeks care about the Gender Genie.

(Having said that, I get a solid 2-1 male-to-female ratio… apparently I write very butch! Just look at that “the”-count.)


PF 02.20.04 at 2:14 pm

Why do you suppose the success rate was so low last summer, and why is it better now? Did they change something in the program?


R.Mutt 02.20.04 at 3:06 pm

re: If you look at the (1) algorithm, you’ll find that maleness is largely determined by the (2) frequency with which the (3) word “the”(4) is used. and Is that based on anything, I wonder? Or just the supposition that women are more indefinite than men?

The strange thing is that “a” is also regarded as a masculine keyword. Apparently women try to avoid articles, or replace them with the typically feminine keyword “her”.


Detached Observer 02.20.04 at 9:06 pm

Incidentally, getting the sex of 10 of the 13 writers of this blog correct is pretty impressive — thats an accuracy of about 77%


PF 02.21.04 at 4:24 pm

Why is it so bad with the women? 100% failure with the women on this blog. Anyone who just guessed male almost all the time could do well with this blog. And maybe with most blogs. Has anyone done a survey of Blogland?


PF 02.21.04 at 4:25 pm

I meant census, of course, not survey.


Jordan 02.21.04 at 7:11 pm

The thing told me Virginia Woolf & Danielle Steele are 65% male. I’m guessing it’s high success rate owes to more male writing gets entered in the first place.

Maybe the reason it’s good at gendering women’s nonfiction is because they fed it a lot of diary-entry, confessional type stuff (since the algorithm seems to equate personal writing with femininity).


PF 02.22.04 at 7:20 am

Look at the stats again. It gets more female than male entries in every category apart from non fiction of less than five hundred words (where it’s about as good as random guessing for males, and significantly worse for females). If you discount the non-fiction entries of more than five hundred words, you really can’t say much about the success rate – most of it goes away right there.

Comments on this entry are closed.