Word Salad

by Kieran Healy on September 16, 2003

Originating from who-knows-where (Uncle Jazzbeau is looking) but spreading fast comes the following:

bq. Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe. ceehiro.

Language Hat was my source. There’s also a Slashdot story.

Now this is very neat. But the explanation — “we do not raed ervey lteter by it slef but the wrod as a wlohe” — raises some questions. The original researchers may have answered them, of course, but a post’s reach should exceed its grasp or what’s a blog for? If the first and last letters must always be in the right place, then any word three letters long or less will always be spelled properly. Having those words around adds a lot of context to a sentence, helping the reader to process the other words. To really test the idea, we need samples of text where that kind of context is missing.

Recrsheears souhld csrncotut secntnees unisg olny wodrs edxcieneg terhe lttrees. Tihs wlil psoe seevral polrbems beaucse wwreell-ittn Esglinh sluohd nlurtaaly cointan mnay sorht wrdos iunidnlcg pvrn-eborses, gtienvie csaes, cncoeinvets and (howpos) penrpsoitois, aongmst many ohtres. Lnoegr wrods soluhd povre useufl when tteinsg tihs ieda. Fatiensnredg wdors dviorecd form hplfeul cnotext mhgit aslo mkae fnie cidenadats for (siht) iiulsocnn. Eelhapnt. Preorpritay. Mainargl. Avtrinmdatiise. Boyend. Caainnbl. Wree tsohe tcekriir tahn tpyical sentecens? Ppostecirve linigusts wlil find csnuotntrcig w-llromefed, ativce senetcens fere form tohse mnay hfepull sroht wrods raehtr dcffiuilt. Tihs txet semes edecnive eonguh of (carp) taht ponit. Neevretslhes, linigstus slohud sitrve twoards tihs gaol. Cvioncning sitedus msut searapte ecah slaml wdor’s cepvidnino-troxtg rloe form the (admn) sipecfic ieda taht praticular otparhghiroc tosntrianipsos gaurantee taht sesne wlil reiman eevn toughh itrnael snbairmclg occrus. Fanlily dleabielrty minlaaitpnug sacmrbled lteter order sohlud mkae tihngs eevn mroe duffiilct. Raeeedrs wlil fnid wdros wtih vbres or (fcuk) cooatsnnns aaenrrgd ceiuoesctlnvy mkae uiansmnrbclg mroe dcffliiut.

(Tankhs to Jmaie Zainkswi and Pehobus for saciftoiimrbclan asstasince.)

{ 22 comments }

1 PZ Myers 09.16.03 at 3:30 am: You might try looking on David Harris’s page:
http://blogs.salon.com/0001092/2003/09/15.html#a464
2 Kai von Fintel 09.16.03 at 3:56 am: Indeed, David Harris is where I got it when I posted it to my weblog. David says he originally got it via email. The source dug up by Uncle Jazzbeau is about spoken word recognition and thus not directly relevant. I have not seen a definitive attribution. But there are some fun additions: on wo’s weblog, one can find a simple scramble bookmarklet.
3 Cobb 09.16.03 at 4:45 am: I found this to be an interesting test of sight reading for my kids. Both my third grader and my fourth grader read the para with no problem, except for the last word.
4 Jacob 09.16.03 at 5:02 am: I’ve always understood that we read words–at least common words–by their shape, rather than by actually reading letter-by-letter. This is why reading things WRITTEN ENTIRELY IN CAPITAL LETTERS is irritating and takes longer, since all the words look the same. I suspect that this is also why, in Kieran’s post, I found it difficult to read “edxcieneg,” for instance, since in exceeding, the ‘d’ is at the end of the word, creating a rather different shape. In Kieran’s long paragraph (which I tired of after a sentence or two and stopped reading), some words are easier to read (boyend, fanlily) and some are harder (tteinsg), and I suspect the difference is that some share the shape of the word they’re trying to be, and some don’t.
5 whatish 09.16.03 at 7:07 am: The last few sentences were significantly easier than the rest. Possibly the ordering consonants and vowels is less intrusive than you’d think.
Can you give a translation, for those sick of anagramming?
6 Maria 09.16.03 at 9:18 am: Or you could try it in french;

Sleon une Ã©dtue de l’UvinertisÃ© de Cmabrigde, l’odrre des ltteers dnas
un mtos n’a pas d’ipmrotncae, la suele coshe ipmrotnate est que
la pmeirÃ¨re et la drenÃ¨ire soit Ã la bnnoe pclae. Le rsete peut Ãªrte
dnas un dsÃ©rorde ttoal et vuos puoevz tujoruos lrie snas porlblÃ¨me.
C’est prace que le creaveu hmauin ne lit pas chuaqe ltetre elle-mmÃªe,
mias le mot cmome un tuot.
7 markus 09.16.03 at 9:32 am: letter order is actually not that important in visual word recognition. Most current models (MROM-p, DRC) can more or less do without it (they also do without semantics and syntax, which means the problem is solveable at the single word level).
The multiple-read-out-model with phonology (Jacobs, Rey, Ziegler & Grainger 1998) is local connectionist, that is, it uses nodes for features, letters and words. Visual input is fed into the model which then works out the word by bottom up and top down processing, together with lateral inhibition to simulate competition at each level. So what happens is that each letter feeds up to the word level and provided there is only one word with that letter combination it will eventually be recognised.
The DRC (Coltheart, Rastle, Perry, Langdon & Ziegler, 2001) has two routes, hence the name dual route cascaded model. The grapheme-phoneme conversion (GPC) route uses stable combinations of GPC which used to be rules but are now handled by (IIRC) a local connectionist network. This route won’t help you a lot in the present case, though to the extent that bigramms (two-letter combinations) remain intact it will help a little. The other route is the lexical route, which these days is a parallel distributed processing neural net (other models try to use only this route, e.g. Seidenberg& Plaut, XXX; Zorzi, Houghton, Butterworth, 1998) The thing here is again, that among the possible attractors of the jumbled input the real word is again the strongest, so the model will eventually reach it. However, the account is not entirely satisfactory. A third group of models, not yet implemented, recurrent neural networks (Van Orden & Goldinger 1994) can of course handle this stuff without problems.

—-
yes, this _is_ what I do for a living, any questions?
8 markus 09.16.03 at 9:54 am: in this nature article, which I found following your links there’s a small error at the end: http://www.nature.com/nsu/990429/990429-2.html

The cocktail party effect (Murray, right-after-the-dawn-of-time, that is around 1980) refers to the fact that you can pick up your name from a conversation you are not listening to. It seems you can also pick up other keywords, but not that many. This is however an attention phenomenon, not one of speech recognition.
However, the mention of background noise brings up memories of Shannon’s (and Shannon & Weaver’s) work on the subject. (short summary; link where you can find the download (check page 7f)) Shannon estimated the redundancy of English to be around 50% (it’s probably even higher).
9 markus 09.16.03 at 10:16 am: @jakob: for all I know the “visual word form system” is a myth. True, some jokers have even located it precisely in the brain (left hemisphere, around the ear IIRC inward from the back & top of the ear) but as I laid out above, most models do fine without one.
The most likely assumption is that word shape is “in the brain” as some kind of superstructure emerging from the shapes of letters (which in turn arise from features). That is, there is no shape info in itself, but shape info arises as a possibly necessary by-product of letter info.
Apart from professional opinion I can offer this argument: If shape info were stored directly, that is quasi independent of phonological info and semantics (the latter is generally assumed to be stored more or less separately) you’d either need an extra mechanism to reconstruct shape from different fonts, handwriting etc or store them all separtly. Either is rather inefficient, especially considering the confusability of letters(e.g.: b,d; p, q; a, o) which would require so much shape analysis that you might as well analyse the letters right away.

As to capitals, the key here is frequency. Nürk did some research on that. He concluded words are best recognized in their most common form (perceptual frequency hypothesis) presumably because the most common combination of letters (initial uppercase, rest lower case) has stronger connections to the word than the all uppercase version.
word type Seems to Be relevant as well, as an initial capital letter Will Hinder vowel Recognition, while for nouns there Is only a small effect (at least in German, where all nouns are capitalised)
10 Tom Runnacles 09.16.03 at 8:08 pm: Never mind what this brand of mangling does to English, just have a look at JWZ’s perl script.

As if perl wasn’t horrible enough already, he has to run the damn script over itself. Ugh :)

I wonder if that invalidates the Coyprgiht notice?
11 sidereal 09.16.03 at 9:27 pm: The word-as-whole-unit explanation is greatly oversimplified. More accurately, you could think of morphemes as whole units, meaning roots as well as affixes. The reason this works so well here is mostly a trick of English, because a) English is relatively lightly affixed. . notice that most words in the paragraph have 1 or no affixes. Compare that with a heavily agglutinated and inflected language like Inuit and, well. . good luck. Even German would be difficult with the agglutinated nouns. Oh, and b) English has fairly strict syntax, so you can easily infer parts of speech from word position, giving big clues as to the identity of the words. As an example:

It’s very fluffly that you could grizzpoop this mergle despite all of the important smackles being missing.

In more freeform languages like . . uh. . blanking. . Cantonese, probably, you wouldn’t have those clues and would therewhence be screwed.

So, in summary, neat trick, but it depends on massive redundancy and syntactic inflection in the host language.
12 Tom Strong 09.16.03 at 10:26 pm: ?shit ta lxece sxycliesd duloW
13 markus 09.16.03 at 11:11 pm: @sidereal: I agree, but most confess I know too little of that stuff
@tom strong: No way. Simplest reason being that scrambled is not easier in any way than normal order. More complex reason being that part of language redundancy is lost in this process. For instance letter pairs (and corresponding sounds/sound pairs) get mixed up, hindering the reader who usually benefits from this.
The most convincing reason may be that dyslectics do not have distorted versions of words stored (even then we’d need an exact match) but rather have no/too few words stored or have trouble using the existing redundancy for their advantage.
14 eszter 09.17.03 at 12:03 am: This is very interesting!

I’m thinking that your violated the rule by mixing up letters in hyphenated words across the hyphen. Those were the words I had the hardest time with (perhaps b/c at first I assumed you weren’t mixing it up). I guess I’m assuming that visually speaking our minds would see the hyphen as a stop so the first and last words on either side of it would count as a separate image.

Someone once told me that there is a word for how easy it is to understand a language even if some parts are missing (there is a word for the scoring the language gets on that). (I realize this is not quite the same issue because no letters are missing here, rather they’re mixed up.) Hungarian has a LOT of accent marks on letters. (I don’t just say “accents” because once a mark is added it actually counts as a different letter.) But many people have no energy to bother with those when typing. In most cases, it is pretty straight forward to read Hungarian text (for those fluent in the language in any case:) without those marks. It’s unclear whether other accented languages like German are just as easy to read in that way (some are probably easier than others). it’s another interesting way of tweaking the language (granted, not quite as interesting as mixing up the words).
15 eszter 09.17.03 at 12:17 am: BTW, although playing with words in this way may suggest that correct spelling is not necessarily that important, in fact, it seems to suggest that it is significant with respect to having the right letters in the word.
16 Katherine 09.17.03 at 5:51 am: except for the hyphenated words, technical terms and a few long words ending in “s” this wasn’t that bad. But the context does make a tremendous difference.
17 Maureen 09.17.03 at 6:35 am: Word shape? Interesting, but my initial reaction was that the ease of reading the word somewhat corresponded to the frequency of the word’s use in standard English and/or length–very long words seem to be the most difficult to decode, but anything shorter than ten letters isn’t too bad. And words that one’s mind isn’t very used to are also difficult.
18 guitchus 09.22.03 at 11:26 am: So we have (what we could henceforth call) the “msesgae”:

“Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe”.

Even if it’s funny, this “msesgae” is an improper and excessive generalization, which conveys an extremely reductive vision. Moreover, whereas it should only remain what it is, i.e. a simple fantasist and entertaining text, it is taking worrying forms (we see it in mails, weblogs, chat-rooms where participants, absolutely amazed and amused, are venerating this “sensational discovery” and friends from everywhere (also excited) are forwarding it in different languages (apparently, this “hoaxmeme” (hoax + meme) is floating all over the web).

Let’s try to encircle the topic (not by haughty pedantry but just by anticonformism and anti-“simplistism”). If you were looking for a serious explanation of it, here is an “anti-hoaxmeme”:

Introduction
Reading is a complex activity that involves many aspects of knowledge, which are of various natures and various complexities (this is due besides to the fact that “writing” is complex). It’s an activity, which implies cognitive processes but also, simultaneously, perceptive processes: reading, it’s to perceive and to identify words.

Development
Many linguists worked on the description of the mechanisms’ evolution of the words’ identification and there are now many developmental models of reading. The principal models comprise three way of reading, which correspond actually to three chronological stages of acquisition (for this presentation, let’s start with the second one):

– the alphabetical reading (second stage): the reader connects the oral examination with the writing (in other words, he learns how to make correspondence between letters and sounds (ex: the sound [k]can be written with ‘c’ (cot), ‘k’ (kiss) or ‘ch’ (chord)). At this stage of phonological mediation, there is a code training; the learner enriches its phonological knowledge and transfers it to new words (it’s a form of self-training). This stage is called an “indirect way” because the reader reads the words through a decoding process.

– the orthographical reading (third stage): the words are analyzed in orthographical units (orthography indicates here the sequence of letters forming the word). There is no phonological conversion; the words are read and recognized directly in reference to a memorized orthographical lexicon. This stage replaces gradually (but not entirely) the alphabetical one. The reader does not need to decipher anymore: he recognizes the words through a “direct way”.

– the logographic reading (which is actually the FIRST stage in the reading training): at this stage, the reader uses various kinds of clues to ‘read’ the words, inter alia, those provided by the extralinguistic environment. The letters’ order and the phonological factors are not taken in account, but the visual clues are. There can be at this stage an instantaneous recognition of familiar words (or somehow ‘learned by heart’), and the riddles made on the basis of projecting visual clues allow the constitution of a first total vocabulary. The visual clues can simply be the length of the word or its “silhouette” (outline) or even just one letter. The classic example to illustrate this stage is the word: “Coca-Cola”, of which logo is easily identified by almost all children of 5-6 years old. If we change only one letter of the word: “Coca-Coca”, children will not notice the difference from the original word (adults neither sometimes, as some experiments proved it).

The most perspicacious of you may have already understood: what occurs actually when we read the “msesgae”, it is that we, literate readers to whom reading and writing have been taught, use our competences, acquired and automated thanks to years of reading experience. In other words, we have developed “HABITS” of reading.

The “msesgae” experiment could let us think that we get back to a logographic reading, in which access to significance is carried out directly via the pictorial semantic system (with words treated like images-logos), but this is not completely true.

Actually, we continue to use the orthographical reading system (in which access to significance is carried out via the verbal semantic system). If we look at the “msesgae Â» more closely, we can notice that 34 of its 68 words (short and common by the way), are correctly spelled (50%, half of the text, and most of them are “grammatical words”). Added to a simple and common syntax (journalistic style of the “forma brevis”) and our capacity of anticipation and auto-reflex correction of more or less experienced reader (the system used is close to the “typing error” one, and anyway, teachers manage quite well to read our essays stuffed with spelling mistakes. In other words, you don’t have to be a Professor of literature to spot “what” in ” waht “!!!), it gives many visual clues!!! (Moreover, there is a syllabic facilitation phenomenon, but I skip the details).

Conclusion
The proposition, which is conveyed through the Â«msesgaeÂ», is not completely false but it is very reductive, and completely incorrect when it affirms that only the place of the first and the last letter of the words do matter. Actually, it deals more with their “silhouette” (from which our (almost standard) system of abbreviations rises (another facilitating clue)). If we can read the “msesgae” without any problem, it is because we are good readers reading a text easily accessible in spite of its orthographic and spelling mistakes.
To prove it, if I give you the correctly spelled words “acetoxybutynylbithiophene deacetylase” or “carboxymethylenebutenolidase”, dear expert readers, you will resort to an alphabetical analysis (second stage) and will use a grapho-phonological decoding for these unknown words (I suppose, this experiment may not always work if you are chemist, druggist or doctor… if it’s the case, sorry for this affront :-).
Another counterexample: if you read AT THE FIRST GO the following sentence as quickly and fluently as you did with the “msesgae”, all my theoric explanation goes down the drain (or you are an innate champion of anagrams!):

“Nreuuoms pmeeononnhs peossss uiapocmltecnd etaaoilxnpn; nwttdtsniinoahg, the pdseuo-snfiiiectc spssliiimtm is not snfiiiectc and eieecndvs are oetfn mdanleiisg”*.

Guillaume Fon Sing,
(alias GUITCHUS)
guitchus@hotmail.com
Linguist

* “Numerous phenomenons possess uncomplicated explanation; notwithstanding, the pseudo-scientific simplistism is not scientific and evidences are often misleading”.

Please forward it, …it can teach sb a thing or two.
19 James Surowiecki 09.23.03 at 3:14 pm: Guitchus —

Your explanation of the “understanding misspelled words” phenomenon is interesting. But the example at the end of the post your post doesn’t really illustrate your point. Your sentence (ungarbled) is not written in comprehensible English. People don’t say that phenomenons “posssess” explanation, although they might say a phenomenon “has” an explanation. “Explanation” is singular in your sentence when it should be plural, since “phenomenons” is plural. Similarly, I don’t think “simplistism” is even a word in English (at least it’s not listed in the OED, and I’ve never heard it before). And finally, no English speaker would write “evidences.” “Evidence” is almost always singular.

Now, this actually makes the point that context is important, since it shows that it’s difficult to make sense of misspelled words in a sentence if the sentence uses those words in an incorrect way. But as a test of how easy it is to read or not read a misspelled (but correct) sentence, it doesn’t prove anything.
20 guitchus 09.24.03 at 8:12 pm: I agree. The last exemple wasn’t “correct”(the reason: it is a very bad translation of the original one (my first text was for French correspondents).
Somehow, the use of “hvae” would have make it too easy (in other words, the aim here was to “destroy” the context with 3 difficult words from the beginning so that you don’t get the grammatical structure of the sentence). Unfortunately, it’s a “francism”.
“simplistism” is (I think) a neologism quite easy to understand (even without using comas).

“Excuse my French…euh, my English I mean”.

Regards,
Guitchus
21 m. chiff 10.16.03 at 11:28 pm: http://www.mrc-cbu.cam.ac.uk/~matt.davis/Cmabrigde/

has a very nice discussion of this with versions in many different languages… and alphabets. His arguments are easy to read and illustrated by several selections where the jumble approach is not understandable. He does a good job of explaining why – although it may seem to be accurate – the claim that letter order doesn’t matter is just not true. There is a little fact mixed up in the gibberish, but on the whole… it just ain’t so.
22 Maroney Terry 12.10.03 at 5:53 pm: Against boredom even the gods contend in vain.

Comments on this entry are closed.

Word Salad

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags