Continuing on the lighter side of things, this program claims to predict an author’s gender based on a writing sample. I tried it with a sample of my own over 500 words long and it succeeded. But it failed for some entries on this blog. Only slightly more surprising, it also failed when I tested the last page of Susan Moller Okin’s Justice, Gender, and the Family and the first two pages or so of Catherine MacKinnon’s Toward a Feminist Theory of the State. It might be interesting to test some longer samples, but my hunch is that this algorithim will usually predict male for samples in the genre of philosophical writing.
I saw this several months ago. Then, and just now, using various long excerpts from email I’ve written, I’ve mostly found it to guess my gender incorrectly. If I am being very analytical, or, perhaps, pedantic, it guesses my gender correctly as “male”. Most of my more informal correspondence it incorrectly identifies as “female”.
This isn’t what I expected, actually, since even my most informal writing tends to be a little “stilted”—that’s always been my own impression. But a lot of my more personal correspondence this algorithm identifies as strongly “female”.
I also came across somwehere recently a feedback-normalized test that incorrectly identified me as female. This is somewhat interesting to me as, for example, many years ago when I took the MMPI it supposedly indicated that I was more masculine than the average man and more feminine than the average woman. A result that pleased me, frankly.
In keeping with my old-style feminism, I was in the past strongly on the nurture side of the nature/nurture debate concerning gender differences. However, too many years of solid scientific research leads me to accept that male and female brains are significantly different. Thus, I would expect that such tools could be refined to produce a relatively high degree of accuracy. But there will always be outliers.
“However, too many years of solid scientific research leads me to accept that male and female brains are significantly different. Thus, I would expect that such tools could be refined to produce a relatively high degree of accuracy.”
I don’t see the “Thus.” Male and female brains could be very different, leading to measurably different responses to low-level tasks like recalling words, reading facial expressions, and mentally rotating three-dimensional objects, without there being any measurable differences between men’s and women’s prose styles, right?
I have to admit I’d find it surprising if one could distinguish the two with a “relatively high degree of accuracy,” unless by “relatively high” you mean, say, consistently getting it right 53% of the time; that might convince me that the machine was doing something a coin-flip couldn’t, but I don’t think it would put me in the position of thinking there was very much about prose style that’s inborn.
Well, I elided some important points. The most significant would be that male and female brains differ according to localization of language. You’re right that it’s not necessarily the case that the observed differences would result in significant differences in writing, but I think it’s suggestive. That’s why I used the word “expect”.
The linked webpage now has a very large sample, and they’re getting a 70% rate of correct guessing. It’s not rigorous, of course, but I’m inclined to think that it’s doing a lot better than your “flip of the coin”. I suspect that empiricism has trumped your intuition.
Now, it should be noted that there’s no reason to assume that observed differences in writing style are necessarily inherent and not learned. My view is that brain studies and such indicate that gender differences are also inherent as well as being learned.
Andrew Northrup has the definitive take on this program:
Conclusions: this is stupid, and the internet is worthless and dumb. According to the site, it guesses wrong 58% of the time, out of about 10,000 votes. There are only two choices, give or take. I’m going to bed.
I think, maybe, that Andrew meant that doing stuff like this on the internet is stupid. If you search his sight for The+internet+is+stupid you get 108 hits, so it’s not such a devastating insult.
I had to skim the underlying algorithm description, but it didn’t seem to me there was any suggestion that the implied differences were inborn rather than learned. Clearly any inborn differences would have to be language-neutral, and as long as the study only used a single corpus from a single language, I’d call it a blip, and not indicative of anything (even if it did have better than a 42% success rate)
it failed
it also failed
Please please look at the stats for the gender genie. You’ll find it is worse than coin-flip for female entries across the board in almost all categories. That means something is seriously wrong with the program.
It has a success rate lower than its currently advertised 70% for male entries.
In fact the only reason the rate is currently 70% is that people are feeding it nonfiction entries longer than 500 words with cues that it correctly recognizes as female - it has a pretty good rate there, but the fact that the correctly-identified-as-female submissions are an order of magnitude larger than any other category and two orders larger than some makes me think someone’s trying to make it look better than it is. (The first time I was led to the stats, it was male fiction entries, I think - I might be wrong - that led the pack, both in success rate and in quantity of submissions.)
Another reason to distrust the 70% correct statistic is that this is all self-reportage. Someone needs to do this experiment with independant verification. Anyhow, don’t trust the genie, at least for now.
I’d also point out that the algorithim used is rather transparent and works on scoring certain words by frequency.
I tried three times. The first time it told me I was male, but I thought well, even though the paper is about gender, it’s my required (hateful) quantitative paper. Maybe that seemed more male, even though there were not actually any numbers in the excerpt I used. Then the second time it told me I was male, even though I fed it part of a fairly fluffy ethnographic conference paper. The third time it finally said I was female, based on a pretty chatty email. But I note that the word “so” is apparently extremely girly, which means that my long-acknowledged overuse in email of that word had a major impact there.
I’m a male (gay, does that matter?) Anyway. I fed it an essay on Orwell and another on Julian of Norwich. I was a male the first time and a female the second time. I wonder if the subject matter of the two essays may have effected my writing and the results.
My articles about Biochemsitry and Jesus are also apparently male. So, my only “female”
piece happens to be about a female. Notably, feminine keywords include “her”, “she”, and “hers”. I wonder if a men writing about women tend to get misread? I also wonder if men really do write about women less, and therefore it’s an effective tool for predicting gender.
Purely speculative, but I’ve always thought that, in conversation, men are slightly more likely to use the first person and women slightly more likely to use the third - and use of the second person probably equal between the genders.
Stereotypical, sure. I wonder if it’s true - with women paying more attention to group dynamics and men paying more attention to themselves?
When is the last time you overheard a woman, in a restaurant or coffee shop, use the word “I” twenty times in ten minutes?
I’m a sensitive girly-man and I still use it all the time. Like so.
That’s interesting, I was thinking the opposite. It does consider ego-words female - and I was thinking it could be right: I often do think women talk about themselves too damn much and impersonal or general subjects too damn little - and take that to be a feminist move of some sort. ‘The personal is political’ thing. Which was a good point once upon a time (yes who does the dishes is very political) but has gotten degraded into meaning ‘that means I’m supposed to drone about myself all the time.’
I’ve tried two so far: got one female and one male.
By the way, another text I gave it came back as a hermaphrodite (no, really! I was perfectly balanced)
I tried three times, twice with academic writing, and once with a personal email.
In all three cases, the program guessed male: I’m a woman.
As far as I can tell, it will assume that you’re female if you write like a valley girl and/or use female pronouns a lot. If you have a moderately authoritative voice, you’re a guy.
what fun!
jo.
I had to laugh when it was wrong about Fantasy writer China Miéville, then explained “she is one butch chick!”
Lingo seems to be learned through mimicry. Our writing vocabularies are different from our spoken ones. For example, I never talk like I’m (formally) writing this. How we write certainly depends on what we’ve read, predominantly, and on our writing models. Nurture rather than inner nature. Also, how comfortable am I with the intended audience? If they’re likely to be ‘hostile’, I may choose a different vocabulary and style.
I don’t see any problem with the algorithm being ‘simple’, if it’s tuned to reality. E=mc2 is a simple formula. F=ma is a simple formula. Both are highly predictive.
Of course, the reason it was ‘wrong’ about Mieville is that I assumed that ‘China’ was a woman’s name. The program’s algorithm seems to be better than mine!
À Gauche
Jeremy Alder
Amaravati
Anggarrgoon
Audhumlan Conspiracy
H.E. Baber
Philip Blosser
Paul Broderick
Matt Brown
Diana Buccafurni
Brandon Butler
Keith Burgess-Jackson
Certain Doubts
David Chalmers
Noam Chomsky
The Conservative Philosopher
Desert Landscapes
Denis Dutton
David Efird
Karl Elliott
David Estlund
Experimental Philosophy
Fake Barn County
Kai von Fintel
Russell Arben Fox
Garden of Forking Paths
Roger Gathman
Michael Green
Scott Hagaman
Helen Habermann
David Hildebrand
John Holbo
Christopher Grau
Jonathan Ichikawa
Tom Irish
Michelle Jenkins
Adam Kotsko
Barry Lam
Language Hat
Language Log
Christian Lee
Brian Leiter
Stephen Lenhart
Clayton Littlejohn
Roderick T. Long
Joshua Macy
Mad Grad
Jonathan Martin
Matthew McGrattan
Marc Moffett
Geoffrey Nunberg
Orange Philosophy
Philosophy Carnival
Philosophy, et cetera
Philosophy of Art
Douglas Portmore
Philosophy from the 617 (moribund)
Jeremy Pierce
Punishment Theory
Geoff Pynn
Timothy Quigley (moribund?)
Conor Roddy
Sappho's Breathing
Anders Schoubye
Wolfgang Schwartz
Scribo
Michael Sevel
Tom Stoneham (moribund)
Adam Swenson
Peter Suber
Eddie Thomas
Joe Ulatowski
Bruce Umbaugh
What is the name ...
Matt Weiner
Will Wilkinson
Jessica Wilson
Young Hegelian
Richard Zach
Psychology
Donyell Coleman
Deborah Frisch
Milt Rosenberg
Tom Stafford
Law
Ann Althouse
Stephen Bainbridge
Jack Balkin
Douglass A. Berman
Francesca Bignami
BlunkettWatch
Jack Bogdanski
Paul L. Caron
Conglomerate
Jeff Cooper
Disability Law
Displacement of Concepts
Wayne Eastman
Eric Fink
Victor Fleischer (on hiatus)
Peter Friedman
Michael Froomkin
Bernard Hibbitts
Walter Hutchens
InstaPundit
Andis Kaulins
Lawmeme
Edward Lee
Karl-Friedrich Lenz
Larry Lessig
Mirror of Justice
Eric Muller
Nathan Oman
Opinio Juris
John Palfrey
Ken Parish
Punishment Theory
Larry Ribstein
The Right Coast
D. Gordon Smith
Lawrence Solum
Peter Tillers
Transatlantic Assembly
Lawrence Velvel
David Wagner
Kim Weatherall
Yale Constitution Society
Tun Yin
History
Blogenspiel
Timothy Burke
Rebunk
Naomi Chana
Chapati Mystery
Cliopatria
Juan Cole
Cranky Professor
Greg Daly
James Davila
Sherman Dorn
Michael Drout
Frog in a Well
Frogs and Ravens
Early Modern Notes
Evan Garcia
George Mason History bloggers
Ghost in the Machine
Rebecca Goetz
Invisible Adjunct (inactive)
Jason Kuznicki
Konrad Mitchell Lawson
Danny Loss
Liberty and Power
Danny Loss
Ether MacAllum Stewart
Pam Mack
Heather Mathews
James Meadway
Medieval Studies
H.D. Miller
Caleb McDaniel
Marc Mulholland
Received Ideas
Renaissance Weblog
Nathaniel Robinson
Jacob Remes (moribund?)
Christopher Sheil
Red Ted
Time Travelling Is Easy
Brian Ulrich
Shana Worthen
Computers/media/communication
Lauren Andreacchi (moribund)
Eric Behrens
Joseph Bosco
Danah Boyd
David Brake
Collin Brooke
Maximilian Dornseif (moribund)
Jeff Erickson
Ed Felten
Lance Fortnow
Louise Ferguson
Anne Galloway
Jason Gallo
Josh Greenberg
Alex Halavais
Sariel Har-Peled
Tracy Kennedy
Tim Lambert
Liz Lawley
Michael O'Foghlu
Jose Luis Orihuela (moribund)
Alex Pang
Sebastian Paquet
Fernando Pereira
Pink Bunny of Battle
Ranting Professors
Jay Rosen
Ken Rufo
Douglas Rushkoff
Vika Safrin
Rob Schaap (Blogorrhoea)
Frank Schaap
Robert A. Stewart
Suresh Venkatasubramanian
Ray Trygstad
Jill Walker
Phil Windley
Siva Vaidahyanathan
Anthropology
Kerim Friedman
Alex Golub
Martijn de Koning
Nicholas Packwood
Geography
Stentor Danielson
Benjamin Heumann
Scott Whitlock
Education
Edward Bilodeau
Jenny D.
Richard Kahn
Progressive Teachers
Kelvin Thompson (defunct?)
Mark Byron
Business administration
Michael Watkins (moribund)
Literature, language, culture
Mike Arnzen
Brandon Barr
Michael Berube
The Blogora
Colin Brayton
John Bruce
Miriam Burstein
Chris Cagle
Jean Chu
Hans Coppens
Tyler Curtain
Cultural Revolution
Terry Dean
Joseph Duemer
Flaschenpost
Kathleen Fitzpatrick
Jonathan Goodwin
Rachael Groner
Alison Hale
Household Opera
Dennis Jerz
Jason Jones
Miriam Jones
Matthew Kirschenbaum
Steven Krause
Lilliputian Lilith
Catherine Liu
John Lovas
Gerald Lucas
Making Contact
Barry Mauer
Erin O'Connor
Print Culture
Clancy Ratcliff
Matthias Rip
A.G. Rud
Amardeep Singh
Steve Shaviro
Thanks ... Zombie
Vera Tobin
Chuck Tryon
University Diaries
Classics
Michael Hendry
David Meadows
Religion
AKM Adam
Ryan Overbey
Telford Work (moribund)
Library Science
Norma Bruce
Music
Kyle Gann
ionarts
Tim Rutherford-Johnson
Greg Sandow
Scott Spiegelberg
Biology/Medicine
Pradeep Atluri
Bloviator
Anthony Cox
Susan Ferrari (moribund)
Amy Greenwood
La Di Da
John M. Lynch
Charles Murtaugh (moribund)
Paul Z. Myers
Respectful of Otters
Josh Rosenau
Universal Acid
Amity Wilczek (moribund)
Theodore Wong (moribund)
Physics/Applied Physics
Trish Amuntrud
Sean Carroll
Jacques Distler
Stephen Hsu
Irascible Professor
Andrew Jaffe
Michael Nielsen
Chad Orzel
String Coffee Table
Math/Statistics
Dead Parrots
Andrew Gelman
Christopher Genovese
Moment, Linger on
Jason Rosenhouse
Vlorbik
Peter Woit
Complex Systems
Petter Holme
Luis Rocha
Cosma Shalizi
Bill Tozier
Chemistry
"Keneth Miles"
Engineering
Zack Amjal
Chris Hall
University Administration
Frank Admissions (moribund?)
Architecture/Urban development
City Comforts (urban planning)
Unfolio
Panchromatica
Earth Sciences
Our Take
Who Knows?
Bitch Ph.D.
Just Tenured
Playing School
Professor Goose
This Academic Life
Other sources of information
Arts and Letters Daily
Boston Review
Imprints
Political Theory Daily Review
Science and Technology Daily Review