What can you not find online?

by Eszter Hargittai on April 28, 2004

There has been much hype about how the Internet and especially search engines (need I name the one in particular?;) are giving everyone everywhere access to anything and everything. I’ve already commented previously about why this simplifies matters (even beyond controlling for mere access issues), but let’s limit our discussion to people who are quite skilled at online information-seeking. What remains – or may increasingly become – hard to access?

Here are some examples. I’d be curious in what other instances people have encountered or perhaps expect to encounter roadblocks at some point.

I have previously discussed the problem of closed systems with respect to course syllabi. Increasingly they are posted on password-protected sites making them hard to access on the open Web. I also continue to be amazed at how few academics post e-copies of their publications on their Web sites. Is it wrong of me to be especially surprised when the academics in question study the Internet in particular? Sure, my academic institution subscribes to many of the publications (although certainly not all) in which academic articles get published, but such subscriptions are only available to a tiny fraction of Web users. (I know, I know, probably only a tiny fraction of Web users are interested in academic publications in the first place, but still, these are examples of gated content.) Many magazines also do not make their articles available to users who are not subscribers.

How about controversial materials? With increasing pressure from various actors (e.g. groups representing commercial or political interests) will we see more material censored or made harder to access? Already Web sites about certain topics are less directly accessible than one might think. (Is it really mere popularity and linking structure that leads to safersex.org as the first hit when you search for sex on Google? Granted, a search for porn seems to lead to more general sites at the top of the list, although I didn’t click through to verify.) And remember the Friendster are not indexed by search engines as far as I know (and people likely use all sorts of nicknames anyway).

Google – to name just one search engine – has some code-dependent limitations built in that makes certain searches difficult. For example, it only allows for a maximum of ten terms in a search query. If you are looking for exact phrases, this can be limiting at times.

In what other cases have you faced search challenges or expect things to get more difficult as time goes on and more materials become proprietary or get hidden due to contoversial status? Do you think I am exaggerating the difficulty of the cases mentioned above?

{ 39 comments }

1

dsquared 04.28.04 at 11:07 pm

To take the most obvious example, I am absolutely sure that there is loads and loads of pirate software and music out there, but I cannot find it despite trying.

2

Anita Hendersen 04.28.04 at 11:41 pm

Sometimes I’ll have a question that the search engines are not well equipped to answer: e.g. “what percentage of adult Americans attended the University of Michigan?” or “how is hard lemonade produced?”

There are some search engines nominally set up to answer these questions in this format (wondir, answerbus), but they have not worked well in my experience.

Also, because so many websites are set up by commercial interests or by fans, sometimes the information you want, while it may be somewhere on the net, gets drowned out in the search engines by sites trying to sell you something or concentrating on non-relevant aspects. This is true even though I’m pretty good about using negative keywords, etc.

I remember once trying forever to find a definition of “balsamic vinegar” on the web and being unable to do so. There were plenty of sites with recipes involving balsamic vinegar or celebrating how great is was, or selling it, but I could not find out how balsamic vinegar differed chemically or in production origin from other vinegar. (Now I can find this information on the net.)

Another thing is the contents of food or consumer products. For instance, the composition of a particular brand of kitchen cleaner or shampoo. You can find the ingredients in the products, but not the relative amounts of those ingredients, nor other data like pH or specific gravity. Maybe these are trade secrets.

3

Chris Tunnell 04.28.04 at 11:43 pm

Search engines have one major drawback: they rely on phrases. Try searching for sounds, images (being researched) or feelings. This alone makes a lot of data inaccessible.

It is textual data that is searchable, not visuals, and this excludes most of the information about our world.

There is also the legal limitations of accessing copyrighted works, but this seems outside the realm of the discussion.

4

Jonathan Edelstein 04.28.04 at 11:51 pm

I’ve often had trouble finding records dating from before digitization became common practice (a date that varies depending upon the place and the type of document at issue). This is true even in many subscription databases; for instance, LEXIS doesn’t carry many law journal articles from before the mid-1980s and its news media coverage narrows sharply before the 1990s.

Where there is a market or government funding, records from the other side of the cyber-curtain have been digitized. The American case law databases in LEXIS go back much further than the news or academic databases; in fact, it’s possible to find decisions dating to the 1750s. This, however, is due to a combination of two fortunate factors: American court decisions are concentrated in a relatively small and readily available number of volumes, and a large number of lawyers are willing to pay through the nose for a searchable database. Where these conditions are absent, digitization of old records is very spotty, and most such records will probably never be digitized.

5

Nasi Lemak 04.29.04 at 1:05 am

Well, not to be too egocentric, but (at least potentially) stuff about me. I have a common name shared with at least one major political figure and that’s enough to make me hard to find…

6

yabonn 04.29.04 at 1:17 am

Formulas of all kinds, or parts of code : anything beyond the very basic alphabet (or is there a way?) are sometimes hard to locate.

About languages, i often find myself fumbling my first searches before pinning the correct english words, but the scarcity of french material turns more often than you’d expect to an advantage. It reduces the commercial dope flood. Same thing happens for spam filtering, by the way.

Come to think of it, same for escaping outsourcement. Wonder if there’s not a “revenge of the non-standarts” brewing somewhere.

7

Mark 04.29.04 at 1:25 am

My research examines various fairly innocuous changes in policy by the United States military during the 1990s (specifically focused on environmental protection) and I know of the existence of a good number of reports, policy evaluations, etc that I have yet to find via search engines (and this doesn’t include the various documents that used to be available but that the DOD, in their wisdom, has seen fit to take offline in the last couple of years.)
On the other hand, a google search of my very common name brings up my CV on the third page (ie: in the top 30 results), despite the fact that I’m a grad student with no significant publications to my name.

8

bob mcmanus 04.29.04 at 1:39 am

Kierkegaard. Most other pre-1930 A-grade philosophy has been typed in.
GE Moore
….
d-squared, Easynews.com and alibis
and others have all the binaries newsgroups. Ain’t napster, you don’t go looking for something specific, you check it every day for five years (or whatever you want to devote) and end up with your dreams fulfilled

9

LiL 04.29.04 at 1:40 am

I think a lot of university legal departments are scaring people away from posting e-copies of their publications. Also, when people use course management systems like Blackboard, syllabi may be posted on the sections of course sites open to guest access or: anyone (like they often are at Princeton, for instance,) these pages are not indexed by search engines unless perhaps if the builders of the course sites include metadata on them. Which is a clunky process that I’ve never seen anyone bother with. Hence, unless someone tells you their course’s name/number or you look through the courses one-by-one to find syllabi, you won’t know they’re there.

As far as finding definitions go, I think google gives better results in that area than it used to – it now has a dictionary built in.

After reading your post, Eszter, I went on a little wild sex (umm?) chase of my own on google, putting in the names of specific acts, and the results were mostly how-to manualish, links to amazon pages or dictionary definitions – except for the odd humor page like this one. Odd in more ways than one.

10

Bill Tozier 04.29.04 at 3:28 am

Stuff about people in my parents’ generation — the Depression-era and WWII-era people. The best you can find about most of them is their postal address, occasionally. People in earlier generations show up in genealogical databases. But while people in later generations may have their own websites, they refer to their parents as “My Mom and My Dad”. Thus, that generation is more or less non-Googleable and un-indexed.

The only thing that puts my Dad’s name on the Web right now is the gradual transcription of NACA and NASA reports and histories….

11

Keith M Ellis 04.29.04 at 4:46 am

Bill’s point is a good general point. Reading a bunch of the WPA oral history accounts (and thinking what a damn good idea that was), it occured to me that the 40 and under generation(s) will be unimaginably well-documented, and that the 80 and older (recent) generations in the US are not badly documented (all things considered), but there’s a couple of generations of personal histories that relatively speaking, will not be. I suppose, however, that I could be very, very wrong about this, considering how ease of publishing and whatnot in the 20th might have single-handedly incredibly increased how much people’s lives are documented.

Still, I wonder. Are we getting our parents’ and granparents’ oral histories?

12

Keith M Ellis 04.29.04 at 5:15 am

And, to answer Eszter’s question: ευδαιμονια.

13

Matthew Boulos 04.29.04 at 6:22 am

Hey Lil, that page might have had a point. :)

About Ezter’s point, classical texts seem to have a harder time making it online, and when they do, there can be serious issues of quality control. A few typos in a single block are enough to throw off my confidence in the transfer job that created the electronic document I’m reading. I’ve noticed this to be a particular problem of theological texts.

14

Thersites 04.29.04 at 6:28 am

I had a student tell me the other day about her experiences as an Arby’s worker. She said that when she was hired she asked about benefits; she was told that she could get health insurance, but that “you couldn’t afford it anyway.” On that basis she was denied access to any materials describing the company’s health plan. I mean, she was told she wasn’t allowed to even look at literature describing the plan.

She asked me if this were legal. I’m a professor, she thought I would know. I didn’t. So I went online to try to get an answer to her question. I found squat.

15

Thersites 04.29.04 at 6:29 am

I had a student tell me the other day about her experiences as an Arby’s worker. She said that when she was hired she asked about benefits; she was told that she could get health insurance, but that “you couldn’t afford it anyway.” On that basis she was denied access to any materials describing the company’s health plan. I mean, she was told she wasn’t allowed to even look at literature describing the plan.

She asked me if this were legal. I’m a professor, she thought I would know. I didn’t. So I went online to try to get an answer to her question. I found squat.

16

Dan 04.29.04 at 6:57 am

It seems to be difficult to get date-specific searches without going into subscription databases like Lexis. For example, my sister is an opera singer who likes to monitor the web for new reviews. Google gives you an option to view pages updated within the last 3 months, but that can still result in an uncomfortably large number of hits to sort through in order to find new entries. It would be much easier if, for instance, the results could be sorted with the newest entries at the top.

17

Dan Simon 04.29.04 at 7:03 am

Pancakes. Try as you might, you’ll never find any real, tasty, fluffy, honest-to-goodness griddle-cooked buttermilk pancakes on the Internet–even using Google.

To which one might well respond–as I would to just about all of the above observations about things that are hard to find on the Internet–so freakin’ what?

18

nick 04.29.04 at 8:05 am

Eighteenth-century poetry. The Chadwyck-Healey database has that sewn up, and you need a blessed academic subscription for access. (Which I have, sort of, but it’s all a bit convoluted.)

But there are also things which you can’t find because the web-spammers have overwhelmed the search engines. Proper details on hotels and car rentals, for instance.

19

Al 04.29.04 at 10:17 am

not totally on topic, but presumably you will have seen this – http://news.bbc.co.uk/1/hi/technology/3632757.stm – from the BBC a couple of weeks back (it’s an article about the CitizenLab project in Toronto). If you weren’t already aware of their work, it’ll make an interesting read. I particularly liked the tag ‘Net Ninjas’ in the title. A little too much Snow Crash there perhaps?

20

des 04.29.04 at 11:16 am

Trying to find an online bookshop (“bookstore”) in a foreign language you don’t know very well is extremely hard, in my experience. The registries (Dmoz and what have you) list every antiquarian boutique which had a homepage in 1996 and not much else, and Google isn’t much help either.

In fact, it would be hard even in the Engleesh without knowing the brands.

21

LTH 04.29.04 at 12:33 pm

Daniel: try AltaVista’s audio search – there are plenty of (esp. old) tracks sitting on mostly Eastern European and Oriental servers.

22

John Quiggin 04.29.04 at 1:00 pm

dan, a big problem with all forms of searching is that a lot of sites refresh in such a way that a page that has been unchanged since 1993 may appear to search engines as being recently modified. Some sort of metadata showing a genuine creation and “last substantive modification” date would be very helpful.

On academic sites, my perception is that this is generally left up to individuals and that most don’t bother to keep their sites up to date. It’s hard to tell whether there is any payoff from doing so (true in spades of blogging, of course)

23

Doug Turnbull 04.29.04 at 1:52 pm

I’ve had great trouble trying to find general, college/grad school level introductory materials in any number of technical areas. Just to pick one exmaple, I was looking for info on pattern recognition and information theory and I found a few fairly cursory introductions and a bunch of advanced, very specific journal articles.

I’ve had the same experience with several other fields when searching for info. Very basic intros are online, and a smattering of journal articles that are way too advanced and specific, and nothing in between.

That area in between–what textbooks supply–seems to be missing on the web. If you want it, you have to go buy a book or have access to a good technical library. Not a problem at a university, but I’m not at a university.

It makes sense–if you wrote a book, you’re not going to give it away. And if you didn’t, why bother putting 300 pages on the web?

24

Chad Orzel 04.29.04 at 2:03 pm

I think a lot of university legal departments are scaring people away from posting e-copies of their publications. Also, when people use course management systems like Blackboard, syllabi may be posted on the sections of course sites open to guest access or: anyone (like they often are at Princeton, for instance,) these pages are not indexed by search engines unless perhaps if the builders of the course sites include metadata on them.

A lot of this is a combination of legal and academic issues. There are some questions about whether it’s strictly legal to post copies of publications on the Web (generally, the journal publisher holds the copyright, unless the work was done for the government), so PDF files showing how the article actually appears in the journal are a little dubious. Preprints in a different format are less likely to be a problem, but many people don’t bother.

With regard to Blackboard and such services, the move to password-protected class web sites is often a deliberate one, at least in the sciences. We’ve already had a few cases where students were able to find solutions for homework problems on a class web site at some other institution, which is a problem.

It’s very helpful to be able to post homework solutions as a study aid for students, but at the same time, I don’t want to screw up anyone else who’s teaching the same material. As a result, I’ve moved some of my class web pages to password-protected sites; some of my colleagues have taken to posting them on open web space, but in a way that makes them unlikely to be indexed by search engines.

The ready availablity of a huge amount of information is both a blessing and a curse.

25

Jeremy Osner 04.29.04 at 2:04 pm

Matthew — some good sites for classical texts:

I am not a scholar so cannot vouch for their reliability, though.

26

rea 04.29.04 at 3:05 pm

“She asked me if this were legal. I’m a professor, she thought I would know. I didn’t. So I went online to try to get an answer to her question. I found squat.”

You have to use subscription data bases to do legal research online–there are some isolated examples of free stuff (see, e. g., http://www.findlaw.com), but seldom enough to get a definitive answer.

This has always struck me as a little strange, given that, after all, statutes and case law are government documents . . . you shouldn’t have to pay to access them.

27

Netwoman 04.29.04 at 3:20 pm

I do a considerable amount of ‘googling’ daily. I think what would be useful to me is somehow getting my search results thematically. For example, me new internet hobby – geneaology – if i want to do a search in this area, i would like to enter in my search word and get geneaology results – not a living person’s track and field results. Women’s rights – no porn please – just appropriate sites. Or, if i am looking for academic sources on a topic…then my results are only academic sources. I think it may be a matter of indexing sites…I am not sure if this makes sense – or if it is already being done, but it would help me going through pages and pages of material.

28

Alex 04.29.04 at 4:50 pm

Example – despite all the wash about e-government, the other day I needed customs tariffs. Could I get customs tariffs? Bollocks. HM Customs & Excise were happy to give me the classification number for the goods I was thinking of importing, but they could only tell me the rate of tax (and hence how much I’d have to pay) if I wanted to buy (by post!) a copy of the entire UK tariff. Bastards.

29

BP 04.29.04 at 5:28 pm

My car. Many’s the time I’ve stared at a large, crowded lot, wondering where I parked my car, wishing I could search Google or press Control-F and find the damn thing.

30

Jason McCullough 04.29.04 at 5:29 pm

Information about men’s haircuts. Even worse, try finding information about the history of men’s haircuts (styles in the 20s, etc.)

Ok, it’s not controversial, but I sure as hell can’t find it online.

31

eszter 04.29.04 at 8:47 pm

Thanks for all the interesting responses. I’ll address a few of them. First of all, Dan Simon asks “so freakin’ what?”. I think it is worth exploring in a bit of detail what types of material we cannot find online given all the hype about how you can find anything and everything online. The examples in this post and comments show that there’s plenty of material that’s either not online or hard to find and I think that’s worth noting.

Netwoman – I don’t know if I understood your concerns precisely, but perhaps the following would help. If you want just academic sources then using Google, you could specify that all your results be on educational sites. You do this by adding site:.edu to your search query. (Of course, .edu sites may contain non-academic sources, but it should help narrow things down.) For excluding track-and-field and other information that seems to clutter up results to certain queries, you could try excluding those terms with the use of the minus sign right in front of the word (no space), e.g. “women’s rights” -track -porn.

Regarding syllabi and why they are not public, you guys raise some good points. Nonetheless, I think it is possible to post the syllabus publically but then post materials for students (whether that’s copies of publications or answers to homework assignments) privately. That’s what I do.

Interesting questions about which generations are being skipped.. whether that’s just online or overall. Related is the point about so much of search being text-based. Audio, video materials are impossible to find unless they are attached to some descriptive text (which is sometimes the case).

And yes bp, I’ve sometimes wanted to press CTRL-F when I’m looking for keys or some document at home or in my office. There’s definitely room for improvement..;)

32

Keith 04.29.04 at 9:46 pm

As a reference Librarian I do high end searches all day long. I’ve found that the most difficult material to retrieve is the highly technical matterial: medical, psychological or any specialized journal articles, which are usually secured behind subscription vendor sites. It’s not impossible to locate thesis papers or most middle or low end journal articles, it just takes finesse. But the high end stuff, that’s locked away in the deep web and unless you have Dialog or Factiva access, forget about it.

33

HP 04.29.04 at 10:01 pm

I may be not as good a web searcher as I think, but I had a recent search for a single, verifiable piece of information turn up nothing. I had just watched a short film about law enforcement made in the 1940s over at the Prelinger Archives. The film mentioned an illegal drug called “yen shi,” as though everyone knows what “yen shi” is. I googled it, along with as many variant spellings as I could think of. Other than the fact that “Yen Shi” is a fairly common name in some parts of the world, the only relevant link was to a 1913 paper on narcotics enforcement that again mentioned yen shi, only in passing, as a commonly abused illegal drug.

Given the enormous interest in some quarters in controlled substances and drug culture minutia, I found this really odd.

Anyone know what yen shi is? Hashish? some kind of opiate?

34

Michael 04.29.04 at 11:29 pm

hp, I found this on xrefer.com. xrefer is a subscription based site which contains a lot of reference titles.

yen shee n
??
opium. A term from the vocabulary of drug users in the 1950s and 1960s which is now rare. It derives from the Cantonese yan or Mandarin yen, meaning smoke and hence opium. Yen shi literally means opium user or addict.

Bloomsbury Dictionary of Contemporary Slang, © Tony Thorne 1997

35

LiL 04.30.04 at 3:27 am

Matthew – that 36

Ampersand 04.30.04 at 3:56 am

This may already have been mentioned, in which case apologies, but the Bush administration has been taking down useful pages from the Women’s Bureau of the Labor Department off the web – for instance, they’ve taken down all the information about the wage gap (they used to have a few really good pages of historical wage gap statistics measured a few different ways).

Salon has an article about it.

37

Adam Morris 04.30.04 at 2:59 pm

Google – to name just one search engine – has some code-dependent limitations built in that makes certain searches difficult. For example, it only allows for a maximum of ten terms in a search query. If you are looking for exact phrases, this can be limiting at times.

Apple’s Sherlock tool lets you search Google for any amount of entries, and it’s easily controlled in the advanced search section.

Anyway, could it be that some advanced searching capabilities themselves are a part of those “gates” you mentioned? How many Googlers even know about the possibility of searching for exact phrases?

38

rogueclassicist 04.30.04 at 4:46 pm

*Good* translations of a number of ancient authors, especially Pliny the Elder. Most of the translations that are out there are pretty lousy …

39

HP 04.30.04 at 5:27 pm

Michael: Thanks. I feel better that the definition came up in a subscription service. A bit odd that the one citation (1913) I found predates the supposed currency of the term by 40-50 years.

Comments on this entry are closed.