Gender genie

by Micah on October 31, 2003

Continuing on the lighter side of things, this “program”: claims to predict an author’s gender based on a writing sample. I tried it with a sample of my own over 500 words long and it succeeded. But it failed for some entries on this blog. Only slightly more surprising, it also failed when I tested the last page of Susan Moller Okin’s “Justice, Gender, and the Family”: and the first two pages or so of Catherine MacKinnon’s “Toward a Feminist Theory of the State”: It might be interesting to test some longer samples, but my hunch is that this algorithim will usually predict male for samples in the genre of philosophical writing.



Keith M Ellis 10.31.03 at 7:51 pm

I saw this several months ago. Then, and just now, using various long excerpts from email I’ve written, I’ve mostly found it to guess my gender incorrectly. If I am being very analytical, or, perhaps, pedantic, it guesses my gender correctly as “male”. Most of my more informal correspondence it incorrectly identifies as “female”.

This isn’t what I expected, actually, since even my most informal writing tends to be a little “stilted”—that’s always been my own impression. But a lot of my more personal correspondence this algorithm identifies as strongly “female”.

I also came across somwehere recently a feedback-normalized test that incorrectly identified me as female. This is somewhat interesting to me as, for example, many years ago when I took the MMPI it supposedly indicated that I was more masculine than the average man and more feminine than the average woman. A result that pleased me, frankly.

In keeping with my old-style feminism, I was in the past strongly on the nurture side of the nature/nurture debate concerning gender differences. However, too many years of solid scientific research leads me to accept that male and female brains are significantly different. Thus, I would expect that such tools could be refined to produce a relatively high degree of accuracy. But there will always be outliers.


J. Ellenberg 10.31.03 at 7:59 pm

“However, too many years of solid scientific research leads me to accept that male and female brains are significantly different. Thus, I would expect that such tools could be refined to produce a relatively high degree of accuracy.”

I don’t see the “Thus.” Male and female brains could be very different, leading to measurably different responses to low-level tasks like recalling words, reading facial expressions, and mentally rotating three-dimensional objects, without there being any measurable differences between men’s and women’s prose styles, right?

I have to admit I’d find it surprising if one could distinguish the two with a “relatively high degree of accuracy,” unless by “relatively high” you mean, say, consistently getting it right 53% of the time; that might convince me that the machine was doing something a coin-flip couldn’t, but I don’t think it would put me in the position of thinking there was very much about prose style that’s inborn.


Keith M Ellis 10.31.03 at 8:08 pm

Well, I elided some important points. The most significant would be that male and female brains differ according to localization of language. You’re right that it’s not _necessarily_ the case that the observed differences would result in significant differences in writing, but I think it’s suggestive. That’s why I used the word “expect”.

The linked webpage now has a very large sample, and they’re getting a 70% rate of correct guessing. It’s not rigorous, of course, but I’m inclined to think that it’s doing a lot better than your “flip of the coin”. I suspect that empiricism has trumped your intuition.

Now, it should be noted that there’s no reason to assume that observed differences in writing style are necessarily inherent and not learned. My view is that brain studies and such indicate that gender differences are _also_ inherent as well as being learned.


Matt Weiner 10.31.03 at 9:00 pm


Matt Weiner 10.31.03 at 9:01 pm

I think, maybe, that Andrew meant that doing stuff like this on the internet is stupid. If you search his sight for The+internet+is+stupid you get 108 hits, so it’s not such a devastating insult.


sidereal 10.31.03 at 10:06 pm

I had to skim the underlying algorithm description, but it didn’t seem to me there was any suggestion that the implied differences were inborn rather than learned. Clearly any inborn differences would have to be language-neutral, and as long as the study only used a single corpus from a single language, I’d call it a blip, and not indicative of anything (even if it did have better than a 42% success rate)


PF 11.01.03 at 2:11 am

it failed
it also failed

Please please look at the stats for the gender genie. You’ll find it is worse than coin-flip for female entries across the board in almost all categories. That means something is seriously wrong with the program.

It has a success rate lower than its currently advertised 70% for male entries.

In fact the only reason the rate is currently 70% is that people are feeding it nonfiction entries longer than 500 words with cues that it correctly recognizes as female – it has a pretty good rate there, but the fact that the correctly-identified-as-female submissions are an order of magnitude larger than any other category and two orders larger than some makes me think someone’s trying to make it look better than it is. (The first time I was led to the stats, it was male fiction entries, I think – I might be wrong – that led the pack, both in success rate and in quantity of submissions.)

Another reason to distrust the 70% correct statistic is that this is all self-reportage. Someone needs to do this experiment with independant verification. Anyhow, don’t trust the genie, at least for now.


Don Hosek 11.01.03 at 2:27 am

Well it got my non-fiction writing correct, but the first two chapters of my novel came out female. But then I’ve attempted to cultivate that in my fiction. If I ever plug in my PC again, I’ll try an older novel which has a female first person narrator and see if it fools the program as well, for whatever it’s worth.

I’d also point out that the algorithim used is rather transparent and works on scoring certain words by frequency.


laura 11.01.03 at 7:29 am

I tried three times. The first time it told me I was male, but I thought well, even though the paper is about gender, it’s my required (hateful) quantitative paper. Maybe that seemed more male, even though there were not actually any numbers in the excerpt I used. Then the second time it told me I was male, even though I fed it part of a fairly fluffy ethnographic conference paper. The third time it finally said I was female, based on a pretty chatty email. But I note that the word “so” is apparently extremely girly, which means that my long-acknowledged overuse in email of that word had a major impact there.


MDtoMN 11.01.03 at 8:19 am

I’m a male (gay, does that matter?) Anyway. I fed it an essay on Orwell and another on Julian of Norwich. I was a male the first time and a female the second time. I wonder if the subject matter of the two essays may have effected my writing and the results.

11 11.01.03 at 8:30 am

My articles about Biochemsitry and Jesus are also apparently male. So, my only “female”
piece happens to be about a female. Notably, feminine keywords include “her”, “she”, and “hers”. I wonder if a men writing about women tend to get misread? I also wonder if men really do write about women less, and therefore it’s an effective tool for predicting gender.


andrew 11.01.03 at 9:39 am

Purely speculative, but I’ve always thought that, in conversation, men are slightly more likely to use the first person and women slightly more likely to use the third – and use of the second person probably equal between the genders.

Stereotypical, sure. I wonder if it’s true – with women paying more attention to group dynamics and men paying more attention to themselves?

When is the last time you overheard a woman, in a restaurant or coffee shop, use the word “I” twenty times in ten minutes?
I’m a sensitive girly-man and I still use it all the time. Like so.


Ophelia Benson 11.01.03 at 6:10 pm

That’s interesting, I was thinking the opposite. It does consider ego-words female – and I was thinking it could be right: I often do think women talk about themselves too damn much and impersonal or general subjects too damn little – and take that to be a feminist move of some sort. ‘The personal is political’ thing. Which was a good point once upon a time (yes who does the dishes is very political) but has gotten degraded into meaning ‘that means I’m supposed to drone about myself all the time.’

I’ve tried two so far: got one female and one male.


Don Hosek 11.02.03 at 1:44 pm

I suspect that its empirical failure rate may be a bit higher than the actual failure rate. The only time I answered the “did I get it right?” question was when it got it wrong. I would imagine that there’s probably a selection bias on that last part of the survey where people are more likely to provide the final feedback when it gets it wrong.

By the way, another text I gave it came back as a hermaphrodite (no, really! I was perfectly balanced)


jo. 11.03.03 at 2:43 am

I tried three times, twice with academic writing, and once with a personal email.

In all three cases, the program guessed male: I’m a woman.

As far as I can tell, it will assume that you’re female if you write like a valley girl and/or use female pronouns a lot. If you have a moderately authoritative voice, you’re a guy.

what fun!



Tony 11.03.03 at 9:11 pm

I had to laugh when it was wrong about Fantasy writer China Miéville, then explained “she is one butch chick!”

Lingo seems to be learned through mimicry. Our writing vocabularies are different from our spoken ones. For example, I never talk like I’m (formally) writing this. How we write certainly depends on what we’ve read, predominantly, and on our writing models. Nurture rather than inner nature. Also, how comfortable am I with the intended audience? If they’re likely to be ‘hostile’, I may choose a different vocabulary and style.

I don’t see any problem with the algorithm being ‘simple’, if it’s tuned to reality. E=mc2 is a simple formula. F=ma is a simple formula. Both are highly predictive.


Tony 11.03.03 at 9:16 pm

Of course, the reason it was ‘wrong’ about Mieville is that I assumed that ‘China’ was a woman’s name. The program’s algorithm seems to be better than mine!

Comments on this entry are closed.