Inspired by Michael Brooke’s post on The Gender Genie , a site that analyses text and guesses whether the author is male or female, I’ve just run samples of the Crooked Timber team’s writings though the test. It turns out that Ted is probably a woman and that all the rest of us (including Eszter and Maria) are men! Harry, whom I had down as a caring-sharing type, turns out to have gallons of testosterone coursing through his sentences. Who’d uv thunk it?
I had a hairdresser once who claimed very seriously that he could tell whether the last person who cut someone’s hair was male or female. In my case it was a man and he guessed wrong, but simply concluded: “Then it was a queer!” An easy escape route, in the case of coiffures. (Is Ted…?)
By the way, I checked the opening paragraph of The Lover by Marguerite Duras: male.
Ha. I’m a man. By a factor of more than three. Female score 536, male 1727.
Mind you, I deliberately chose a particularly, shall we say, acidulous post to try. But then my posts tend to run the gamut all the way from mildly acidulous to very acidulous, so I’m not sure it matters much.
(I know, that’s not how the Gender Genie is testing.)
Because I write in very different ways for different audiences I ran a bunch of things (stung by Chris’s observation). My blog posts are v. male (not as male as Ophelia’s!). My academic papers (all recent, or unpublished — I ran long excerpts from 7) are all male, but much less so than the blog posts, and a couple barely at all. 2 are co-written with a woman, and they came out more male than the 1 co-written with a man. My journalism, like my blogs, is v. male.
Then I ran something from my favourite female solo-blogger (Laura at Apt 11D) and she came out as male as my blog posts, so I no longer trust the algorithm
It could also be that the style of writing in a political type blog is quite ‘male’, i.e. didactic, analytical, less about ‘me’ and more about ‘them’. (Though I don’t think it’s an accident that there are so many more male bloggers than female. There’s something about holding forth and expecting an audience that is more typically a male trait…)
But the algorithm they use seems to put the cart before the horse, much the same as those gender tests we were doing last year (can’t remember the link - but mine said I was not only a man, but probably autistic too.)
Instead of starting with the assumption that womens’ writing is touchy-feely and all about me, and creating an algorithm on that basis, would it not have made more sense to feed an AI programme with lots of writing, tell it which gender each is written by, and let the thing learn? That’s how the most accurate spam filters are being developed right now - giving the programme with lots of ‘good email’ so it can determine the characteristics of same and distinguish it from spam.
Seems to me all that gender genie can do is regurgitate its authors’ dodgy premises.
But I’m probably just a bit ashamed that my first reaction was ‘great, I don’t write like a girl!’.
It doesn’t appear to pick up on the fact that I am three separate people, one of whom is female …
dsquared, then shouldn’t you change your name to ‘dsquared*x’, or something like that?
I thought I’d posted again, but I must have forgotten to hit post after preview - so girly of me.
I said that I tried two or three more of my blog posts (haven’t tried any more formal writing, I should - only I’m supposed to write 3000 words in the next 24 hours and I only have a bit over 700 so far, so what the hell am I doing this for, a break, that’s what!), and got less out of proportion numbers including one just barely female.
“(Though I don’t think it’s an accident that there are so many more male bloggers than female. There’s something about holding forth and expecting an audience that is more typically a male trait…)”
Yeah, and we need to change that! Women need to get more noisy and bossy and opinionated.
That’s only half joke.
And I often wonder. Maybe not, maybe it’s just that we naturally notice what is said to and about our precious Selves more than we do what’s said to and about other people - but all the same I often wonder if I don’t get more, shall we say, overheated reactions, because I’m a noisy woman. And if so - oh well, you know the rest.
2300 words. I feel sick.
If you look at the (1) algorithm, you’ll find that maleness is largely determined by the (2) frequency with which the (3) word “the”(4) is used.
I did notice that very thing. (Ooh, ‘that’, how macho.)
Is that based on anything, I wonder? Or just the supposition that women are more indefinite than men?
The. The supposition. The.
I still think the stats are worth a read.
Why is it so good at guessing female non-fiction? (66648 correct answers versus 9400 incorrect of entries longer than 500 words.)
Why is it so bad at female blogs and fiction? (2179 correct versus 3872 incorrect of blog entries longer than 500 words and 3695 correct versus 5200 incorrect of non-fiction entries longer than 500 words; both significantly worse than random guessing would be.)
Why is the total number of entries in the female non-fiction category, where the genie is best, an order of magnitude greater than the female entries in either the blog category or the fiction category? If we reduced that category to the size of the others, the 64.46% rate of success would certainly go down, and that is the only category where the genie is better than 50% with women. (There are more male non-fiction entries longer than 500 words, and the genie is more successful with them, but not as many as with the women.)
In the “Old Totals”, submissions between August 15 and September 13, the Genie is just about as good as a random guesser would be, assuming there were equal totals of men and women.
And why does everyone always seem so happy when the genie gets them wrong? I think the accent tends to be laid on “I am not the gender I am” rather than “this computer program is crap because counting definite articles won’t get you where you want to go.”
And then all the stats are from self-reporting, so who knows, really.
I found a use for this thing! I’m writing an article in an internet cafe where the computers have only one word processor, wordpad, which, as far as I know, has no word count. So now, if I want to check how much I’ve written, I feed it to the Genie. Oddly enough, when you enter a Dutch text, you get scores like male: 104, female 0… The only word the Genie recognizes is “is”, which is typically male it seems.
I think the way the creators set the thing up was exactly to start with a large corpus of writing from both sexes and analyze the word frequencies. Then they use the deduced word frequencies to analyze the input.
They didn’t choose the ‘female’ words because they felt ‘touchy-feely-togethery’, they chose them empirically and they just turned out to be ‘touchy-feely-togethery’.
At least I hope that’s the case.
The less-than-stellar success rate may be due to the fact that the people who are entering text at the website are a very unrepresentative sample of the general population of writers. In other words, only freaks and geeks care about the Gender Genie.
(Having said that, I get a solid 2-1 male-to-female ratio… apparently I write very butch! Just look at that “the”-count.)
Why do you suppose the success rate was so low last summer, and why is it better now? Did they change something in the program?
re: If you look at the (1) algorithm, you’ll find that maleness is largely determined by the (2) frequency with which the (3) word “the”(4) is used. and Is that based on anything, I wonder? Or just the supposition that women are more indefinite than men?
The strange thing is that “a” is also regarded as a masculine keyword. Apparently women try to avoid articles, or replace them with the typically feminine keyword “her”.
Incidentally, getting the sex of 10 of the 13 writers of this blog correct is pretty impressive — thats an accuracy of about 77%
Why is it so bad with the women? 100% failure with the women on this blog. Anyone who just guessed male almost all the time could do well with this blog. And maybe with most blogs. Has anyone done a survey of Blogland?
I meant census, of course, not survey.
The thing told me Virginia Woolf & Danielle Steele are 65% male. I’m guessing it’s high success rate owes to more male writing gets entered in the first place.
Maybe the reason it’s good at gendering women’s nonfiction is because they fed it a lot of diary-entry, confessional type stuff (since the algorithm seems to equate personal writing with femininity).
Look at the stats again. It gets more female than male entries in every category apart from non fiction of less than five hundred words (where it’s about as good as random guessing for males, and significantly worse for females). If you discount the non-fiction entries of more than five hundred words, you really can’t say much about the success rate - most of it goes away right there.
Seems like everybody is trying out the Gender Genie, which analyses a piece of text and guesses the gender of the writer...
Read more at John Lott's Unethical Conduct
À Gauche
Jeremy Alder
Amaravati
Anggarrgoon
Audhumlan Conspiracy
H.E. Baber
Philip Blosser
Paul Broderick
Matt Brown
Diana Buccafurni
Brandon Butler
Keith Burgess-Jackson
Certain Doubts
David Chalmers
Noam Chomsky
The Conservative Philosopher
Desert Landscapes
Denis Dutton
David Efird
Karl Elliott
David Estlund
Experimental Philosophy
Fake Barn County
Kai von Fintel
Russell Arben Fox
Garden of Forking Paths
Roger Gathman
Michael Green
Scott Hagaman
Helen Habermann
David Hildebrand
John Holbo
Christopher Grau
Jonathan Ichikawa
Tom Irish
Michelle Jenkins
Adam Kotsko
Barry Lam
Language Hat
Language Log
Christian Lee
Brian Leiter
Stephen Lenhart
Clayton Littlejohn
Roderick T. Long
Joshua Macy
Mad Grad
Jonathan Martin
Matthew McGrattan
Marc Moffett
Geoffrey Nunberg
Orange Philosophy
Philosophy Carnival
Philosophy, et cetera
Philosophy of Art
Douglas Portmore
Philosophy from the 617 (moribund)
Jeremy Pierce
Punishment Theory
Geoff Pynn
Timothy Quigley (moribund?)
Conor Roddy
Sappho's Breathing
Anders Schoubye
Wolfgang Schwartz
Scribo
Michael Sevel
Tom Stoneham (moribund)
Adam Swenson
Peter Suber
Eddie Thomas
Joe Ulatowski
Bruce Umbaugh
What is the name ...
Matt Weiner
Will Wilkinson
Jessica Wilson
Young Hegelian
Richard Zach
Psychology
Donyell Coleman
Deborah Frisch
Milt Rosenberg
Tom Stafford
Law
Ann Althouse
Stephen Bainbridge
Jack Balkin
Douglass A. Berman
Francesca Bignami
BlunkettWatch
Jack Bogdanski
Paul L. Caron
Conglomerate
Jeff Cooper
Disability Law
Displacement of Concepts
Wayne Eastman
Eric Fink
Victor Fleischer (on hiatus)
Peter Friedman
Michael Froomkin
Bernard Hibbitts
Walter Hutchens
InstaPundit
Andis Kaulins
Lawmeme
Edward Lee
Karl-Friedrich Lenz
Larry Lessig
Mirror of Justice
Eric Muller
Nathan Oman
Opinio Juris
John Palfrey
Ken Parish
Punishment Theory
Larry Ribstein
The Right Coast
D. Gordon Smith
Lawrence Solum
Peter Tillers
Transatlantic Assembly
Lawrence Velvel
David Wagner
Kim Weatherall
Yale Constitution Society
Tun Yin
History
Blogenspiel
Timothy Burke
Rebunk
Naomi Chana
Chapati Mystery
Cliopatria
Juan Cole
Cranky Professor
Greg Daly
James Davila
Sherman Dorn
Michael Drout
Frog in a Well
Frogs and Ravens
Early Modern Notes
Evan Garcia
George Mason History bloggers
Ghost in the Machine
Rebecca Goetz
Invisible Adjunct (inactive)
Jason Kuznicki
Konrad Mitchell Lawson
Danny Loss
Liberty and Power
Danny Loss
Ether MacAllum Stewart
Pam Mack
Heather Mathews
James Meadway
Medieval Studies
H.D. Miller
Caleb McDaniel
Marc Mulholland
Received Ideas
Renaissance Weblog
Nathaniel Robinson
Jacob Remes (moribund?)
Christopher Sheil
Red Ted
Time Travelling Is Easy
Brian Ulrich
Shana Worthen
Computers/media/communication
Lauren Andreacchi (moribund)
Eric Behrens
Joseph Bosco
Danah Boyd
David Brake
Collin Brooke
Maximilian Dornseif (moribund)
Jeff Erickson
Ed Felten
Lance Fortnow
Louise Ferguson
Anne Galloway
Jason Gallo
Josh Greenberg
Alex Halavais
Sariel Har-Peled
Tracy Kennedy
Tim Lambert
Liz Lawley
Michael O'Foghlu
Jose Luis Orihuela (moribund)
Alex Pang
Sebastian Paquet
Fernando Pereira
Pink Bunny of Battle
Ranting Professors
Jay Rosen
Ken Rufo
Douglas Rushkoff
Vika Safrin
Rob Schaap (Blogorrhoea)
Frank Schaap
Robert A. Stewart
Suresh Venkatasubramanian
Ray Trygstad
Jill Walker
Phil Windley
Siva Vaidahyanathan
Anthropology
Kerim Friedman
Alex Golub
Martijn de Koning
Nicholas Packwood
Geography
Stentor Danielson
Benjamin Heumann
Scott Whitlock
Education
Edward Bilodeau
Jenny D.
Richard Kahn
Progressive Teachers
Kelvin Thompson (defunct?)
Mark Byron
Business administration
Michael Watkins (moribund)
Literature, language, culture
Mike Arnzen
Brandon Barr
Michael Berube
The Blogora
Colin Brayton
John Bruce
Miriam Burstein
Chris Cagle
Jean Chu
Hans Coppens
Tyler Curtain
Cultural Revolution
Terry Dean
Joseph Duemer
Flaschenpost
Kathleen Fitzpatrick
Jonathan Goodwin
Rachael Groner
Alison Hale
Household Opera
Dennis Jerz
Jason Jones
Miriam Jones
Matthew Kirschenbaum
Steven Krause
Lilliputian Lilith
Catherine Liu
John Lovas
Gerald Lucas
Making Contact
Barry Mauer
Erin O'Connor
Print Culture
Clancy Ratcliff
Matthias Rip
A.G. Rud
Amardeep Singh
Steve Shaviro
Thanks ... Zombie
Vera Tobin
Chuck Tryon
University Diaries
Classics
Michael Hendry
David Meadows
Religion
AKM Adam
Ryan Overbey
Telford Work (moribund)
Library Science
Norma Bruce
Music
Kyle Gann
ionarts
Tim Rutherford-Johnson
Greg Sandow
Scott Spiegelberg
Biology/Medicine
Pradeep Atluri
Bloviator
Anthony Cox
Susan Ferrari (moribund)
Amy Greenwood
La Di Da
John M. Lynch
Charles Murtaugh (moribund)
Paul Z. Myers
Respectful of Otters
Josh Rosenau
Universal Acid
Amity Wilczek (moribund)
Theodore Wong (moribund)
Physics/Applied Physics
Trish Amuntrud
Sean Carroll
Jacques Distler
Stephen Hsu
Irascible Professor
Andrew Jaffe
Michael Nielsen
Chad Orzel
String Coffee Table
Math/Statistics
Dead Parrots
Andrew Gelman
Christopher Genovese
Moment, Linger on
Jason Rosenhouse
Vlorbik
Peter Woit
Complex Systems
Petter Holme
Luis Rocha
Cosma Shalizi
Bill Tozier
Chemistry
"Keneth Miles"
Engineering
Zack Amjal
Chris Hall
University Administration
Frank Admissions (moribund?)
Architecture/Urban development
City Comforts (urban planning)
Unfolio
Panchromatica
Earth Sciences
Our Take
Who Knows?
Bitch Ph.D.
Just Tenured
Playing School
Professor Goose
This Academic Life
Other sources of information
Arts and Letters Daily
Boston Review
Imprints
Political Theory Daily Review
Science and Technology Daily Review