Redefining Plagiarism

by Henry on May 21, 2009

Groklaw points out some interesting characteristics of the Terms of Service for Wolfram Alpha:

As Wolfram|Alpha is an authoritative source of information, maintaining the integrity of its data and the computations we do with that data is vital to the success of our project. … f you make results from Wolfram|Alpha available to anyone else, or incorporate those results into your own documents or presentations, you must include attribution indicating that the results and/or the presentation of the results came from Wolfram|Alpha. Some Wolfram|Alpha results include copyright statements or attributions linking the results to us or to third-party data providers, and you may not remove or obscure those attributions or copyright statements. Whenever possible, such attribution should take the form of a link to Wolfram|Alpha, either to the front page of the website or, better yet, to the specific query that generated the results you used. … Failure to properly attribute results from Wolfram|Alpha is not only a violation of these terms, but may also constitute academic plagiarism or a violation of copyright law. Attribution is something we expect you to give us in exchange for us having provided you with a high-quality free service. The specific images, such as plots, typeset formulas, and tables, as well as the general page layouts, are all copyrighted by Wolfram|Alpha at the time Wolfram|Alpha generates them.

I wouldn’t read any of this as a considered statement of principle – more an attempt by Wolfram Research, as per usual, to state a maximally aggressive claim for their perceived intellectual property. While they are insisting on proper citation here, in the past they have actually threatened legal action against scholars for citing to the mere existence of a mathematical proof that they claimed was a trade secret. Even so, their policy raises some interesting issues about what plagiarism actually involves in an era where complex data query tools are widely available.

My personal take fwiw – I think it’s very good practice to acknowledge where your data comes from and what your tools of analysis are, if only because it allows people to replicate your results, spot obvious errors and so on. But the suggestion that a failure to make such an acknowledgment constitutes academic plagiarism seems to me to be half-arsed – at worst this would be sloppy citation, and (depending on the exact query and the amount of independent thought I had to put into it), perhaps not even that. If I use Stata to generate a graph, and don’t acknowledge that, it isn’t plagiarism, nor is it plagiarism (I don’t think – others may differ), if I fail to acknowledge that I am using GSS data for the graph, although it is bad citation practice that a reviewer might reasonably complain about. It would be plagiarism, obviously, if someone else had carried out the query, and I copied their graph without acknowledging it. Or, alternatively, it would be plagiarism if I had somehow suggested that GSS data (or data gathered by someone else) had been gathered by me myself.

So for me, at least, plagiarism involves someone duplicating primary transformative work carried out by a real human individual in a manner suggesting that she (the plagiarist) is the originator of the work. Using a fancy automated query tool without acknowledging it directly isn’t plagiarism, any more than it would be plagiarism to fail to acknowledge the manufacturer of your pocket calculator when you add or subtract something. Perhaps this standard should change as query tools become more advanced, but also, perhaps not.



Ahistoricality 05.21.09 at 3:22 pm

So, all my students who cite Google are actually just following the Terms of Service….. good to know.


Henry 05.21.09 at 3:27 pm

Actually, no, as the Groklaw post points out:

Google, in contrast, has no Terms of Use on its main page. You have to dig to find it at all, but here it is, and basically it says you agree you won’t violate any laws. You don’t have to credit Google for your search results.


dsquared 05.21.09 at 3:28 pm

I think they’re in line with the terms of use of similar paid data providers. My research is absolutely studded with reference to “Source: the BLOOMBERG PROFESSIONAL service [tm]” and “Source “Datastream International (c) ALL RIGHTS RESERVED”. The underlying issue is whether you can have copyright on a database (ie, on the particular organisation and structure of one, distinct from the actual data), something which all sorts of Lawrence Lessig types apparently view as a horrible imposition but which seems to me to be utterly obvious the other way.


Henry 05.21.09 at 3:34 pm

But (and IANAL and so on), apart from the merits of the issue, isn’t there a big difference between the EU (with a Database Directive) and the US (with no strong copyright on databases), or am I completely out of date on this?


dsquared 05.21.09 at 3:43 pm

Don’t know – suspect it’s contested territory and that as a result everyone’s got the incentive to maximise their claims. For the proprietary data services it’s less of an issue of course as they won’t give you the data feed unless you sign a contract saying how you’re going to credit them; I would guess that putting everything on the web with a “Terms of Use” hidden ten flights down in a cabinet marked “BEWARE OF THE LEOPARD” puts one on a somewhat less solid legal footing, hence the attempts to bring plagiarism into the question.


bianca steele 05.21.09 at 3:51 pm

Similar wording (as I recall) in the front matter of A New Kind of Science


Salient 05.21.09 at 4:03 pm

It’s particularly amusing to me that it’s Steven Wolfram who is responsible for making these adventurous claims. It’s a little like finding a notice on the front page of a book of short stories “by” Issac Asimov sternly admonishing re: the importance of properly crediting the original author whenever you quote something from the book.


Kieran Healy 05.21.09 at 4:07 pm

I was wondering how and when some bit of megalomania-crankery would emerge from the Alpha launch. One would expect no less from the Ann Elk of complexity/automata theory.


Tom Scudder 05.21.09 at 4:43 pm

Slight hijack: Salient – is there some suggestion that Isaac Asimov didn’t write his own stories?


kid bitzer 05.21.09 at 4:53 pm

some of them were written by dumas pere.


Anders Widebrant 05.21.09 at 5:29 pm

Well, that’s interesting, considering that Wolfram Alpha’s own search results aren’t exactly crystal clear on the nature of their sources.


Ginger Yellow 05.21.09 at 5:34 pm

But (and IANAL and so on), apart from the merits of the issue, isn’t there a big difference between the EU (with a Database Directive) and the US (with no strong copyright on databases), or am I completely out of date on this?

IANAL also, but as far as I know, yes. Infamously, the Premiership and Football League sued people for publishing fixture lists, claiming that doing so was infringing the copyright of their database, regardless of whether people were commercially exploiting the information or how they obtained it. There’s no way that would fly in the States.


christian h. 05.21.09 at 5:38 pm

What salient said. Wolfram of all people? Come on.


Salient 05.21.09 at 5:48 pm

Salient – is there some suggestion that Isaac Asimov didn’t write his own stories?

Eh, I shouldn’t have been so bold with the analogy. To answer your question, phrased as you phrased it: no.

To answer the milder question “have there been plausible suggestions that Asimov borrowed liberally from others in ways that a strict pedant might call plagiarism brinkmanship” — maybe. I suspect the various rumors of the uses he found for old student assignments (for example) were probably exaggerated (certainly I have no substantiation available). And even granting credulity to such rumors, I believe there’s quite a difference between making use of someone’s plot hooks and plagiarizing their work.

Unfortunately, I was trying to tie in a reference Asimov precisely because he took such a very different approach to the topic. His essays on what constitutes plagiarism, what constitutes inspiration, and what it means to assert proprietary ownership over an idea, were all reasonable, even-handed, and non-polemic (at least this is true for those essays which I’ve read, all found in old Asimov’s Science Fiction Magazine issues).

So the (mangled) point was, even if Asimov liberally borrowed story/plot ideas from time to time, at least this cohered with his reasonable views on what constitutes legitimate use of another person’s ideas. As opposed to Wolfram, whose views as expressed in this Terms of Use are remarkably inconsistent with his practices (which, I’m assuming, was half the point of the original post).


onymous 05.21.09 at 5:48 pm

Wolfram|Alpha doesn’t know what to do with my input. How do I write to Stephen Wolfram to tell him what he can do with my input?


JSM 05.21.09 at 5:52 pm

I almost never actually LOL at things on sites, but “Ann Elk”! Made my day, Mr Healy.


Ahistoricality 05.21.09 at 7:45 pm

Henry, I was kidding. A citation to Google means they don’t know anything about citations, sources, or research.


Henry 05.21.09 at 7:53 pm

Oh yeah, I got that and should have said so clearly (I have the same problem with my students), but I thought it was interesting that Google explicitly does not try it on in the same ways.


rm 05.21.09 at 8:18 pm

It would be a lot easier to cite Wolfram Vertical Line Alpha if the site’s name was typeable.


rm 05.21.09 at 8:19 pm

Were. If it were. Geez.


steven 05.21.09 at 8:22 pm

Oh cool, a combination of calculator and Wikipedia-scraper.

Fortunately for the silly EULA, if Wolfram|Alpha isn’t transparent as to its sources, which it isn’t very, no one has any very good reason to trust its results, and therefore won’t be inclined to cite it as a sole source anyway.


The Modesto Kid 05.21.09 at 8:27 pm

A vertical line is right over there on the right side of most keyboards, above the Enter key. No?


steven 05.21.09 at 8:43 pm

(Plus, “curated data” makes me want to laugh in its face.)


robertdfeinman 05.21.09 at 9:28 pm

Since the alpha project is continually “learning” new information there is no guarantee that if a future person puts in the same query they will get the same results. So this makes the idea of some immutable piece of knowledge, which is what plagiarism traditionally involves, questionable.

Even without the “learning”, what happens if one asks something like “what is the population of Pakistan” today and then tomorrow the answer reflects the additional new citizens. Since alpha is entirely based upon other’s work for its data, it is really a far fetched claim.

They could make a claim about their algorithms, if someone managed to decode it and use it without permission. But have they filed either patents for it or are they claiming trade secret status?


bianca steele 05.21.09 at 9:32 pm

It kind of bugs me to see “data integrity” and similar terms misused just the tiniest bit, but since the above is just legal boilerplate, it probably doesn’t matter.

You would care about data integrity (this is simplified) in two cases, basically: you don’t want uncontrolled data uploaded to the database, and you don’t want people you control using the wrong data. I don’t see any way the first case applies, as there is no conceivable way anything a user does could affect the database. In the second case, I also don’t see it, as the company does not own the data and has no reason to need to control the people who use it after it’s provided–certainly not for data like “the word bigrams in the first sentence of A Tale of Two Cities,” or even for “my risk of heart disease according to the formula given by the Framingham Heart Study.” So “data integrity” here is probably a (meaningless) buzzword, suggesting “corporate responsibility.”

I don’t see any usable citation to the Framingham Heart Study, either. Getting from “Search the Web” to a usable Google query took three steps (two guesses on my part and a suggestion from the search engine). It looks real neat as a science project, though. If they don’t run out of money first, it may someday be useful.


bianca steele 05.21.09 at 9:50 pm

@24: That’s exactly the usual concern about data integrity. People would download financial information to their PC’s, thinking they were being clever, in the old days when everything was still done officially on mainframes, and then would use their personal spreadsheet programs to manipulate the data. This gives old-time IT guys the heebie-jeebies. There was no way to guarantee they’d used the right queries, had up-to-date information, etc. (with then-current technology). They loved it when all the computations were done in their inner sanctum under their control. There’s a reason they were jokingly called a “high priesthood.”

Re. guessing what the system does in detail, so that you could copy their work without paying: sure, you can speculate about this stuff all day. It’s only ones and zeroes.


Julius Beezer 05.21.09 at 10:01 pm

Compare M. Wolfram’s somewhat tedious demonstration of his masterpiece with Google’s brilliantly brief exposition of GoogleDocs.

After several minutes of Wolfram’s anodyne estuary sales patter, me and the blonde glazed over, then quit.

But the layout is very nice, and I have used since in practical ways. Try “dawn” for example. Quite useful.


Ginger Yellow 05.21.09 at 10:05 pm

“A vertical line is right over there on the right side of most keyboards, above the Enter key. No?”

Not on European keyboards. Backspace is above the Enter key, which is an inverted L shape covering two rows.


nnyhav 05.21.09 at 10:10 pm

It didn’t know what to do with “alpha-stable”, but suggested finance & chemistry …

(I started reading an introduction to inverse theory but gave up after I couldn’t find the first negative footnote.)


Peter 05.22.09 at 2:41 am

“Not on European keyboards. Backspace is above the Enter key, which is an inverted L shape covering two rows.”

Still, I imagine if you find the “\” key, above it you’ll see a (possibly broken) vertical line. That will come out as “|” when you type shift-\.
Not exactly impossible to type. It’s a nearly completely-useless search engine though :)


rm 05.22.09 at 3:33 am

Vertical line over the Enter key . . . let’s see . . . sh*%, there it is. D#$%, I never would have seen that in a thousand years. This is where the tech geeks start blaming the users for not knowing things, but really, who ever uses that key? ||||||||||||, ha ha. I think the pomposity of Wolfram | Alpha’s rhetoric — which extends to the |-containing name — compared to the plain-language rhetoric of Google is symptomatic of the whole enterprise. It’s too techy for most people to want to adopt, and then when you ask the techy people about it, you find that it doesn’t really work for the experts either.


Doug 05.22.09 at 5:28 am

26: There’s a reason they were jokingly called a “high priesthood.”

There. All fixed now.


Omega Centauri 05.22.09 at 5:55 am

The little bit of messing around I did with it gives me the impression, it is perhaps a little quicker way to get certain type of information, and perhaps just a bit quicker as a calculator. But being just a little faster at answering relatively trivial questions doesn’t strike me as a reason to assign ownership of the results.


SJ 05.22.09 at 11:10 am

Type in “potrzebie”. Wolfram|Alpha interprets it as a unit of length (about 2.26 mm). There’s no link to any other source, so it appears that W|A is claiming copyright to this information.

But the definition actually comes from a Mad Magazine gag from 1957 written by Donald “Art of Computer Programming” Knuth. Copyright would belong to either Mad or Knuth, depending on the arrangement between them, and certainly not to W|A.

The way I see it, if someone not only copies something without attribution, but goes so far as to assert a fresh copyright over it, they are committing both plagiarism and copyright violation.


Paul 05.22.09 at 4:13 pm

Plagiarism in academia ? Say it ain’t so…


Adam Kotsko 05.22.09 at 4:22 pm

The vertical line is used in Unix command lines, right? To “pipe” something?


Righteous Bubba 05.22.09 at 5:02 pm

But being just a little faster at answering relatively trivial questions doesn’t strike me as a reason to assign ownership of the results.

I’ve tried it a bunch, but the number of times it returned some variation of a blank for an answer was off-putting, and no I don’t want to help it learn if another search engine can get me there faster. Not useful at all to me.


Wax Banks 05.22.09 at 5:45 pm

The vertical line is used in Unix command lines, right? To “pipe” something?

Yes, in addition to being an important character for those men of wealth and taste who include ASCII-art maps with their electronic party invitations (and by necessity use fixed-width fonts, God’s fonts, for their email).


joel hanes 05.22.09 at 7:15 pm

The vertical line is used in Unix command lines, right? To “pipe” something?

Yes. Speaking aloud, Unix geeks prounounce it “pipe” or “bar”

It’s also used as the bitwise-OR arithmetic operator in C and in the Verilog hardware-design language,
and doubled, ||, it’s the logical-OR operator in Unix shells, C, Verilog, and perl.
(The shell logical-OR means roughly “perform the command following if the preceding command fails”.)

I type that character hundreds of times every day,
even when not crafting flat-ASCII maps for my emailed party invitations.

Here’s a famous poem written entirely in special characters, with translation :

Re: Stuck shift key poetry (Dave Zobel)

A fragment of a drinking (or financing?) song called “Hatless Atlas”:


hat less at less point at star
backbrace double base pound space bar
dash at cash and slash base rate
wow open tab at bar is great
semi backquote plus cash huh DEL
comma pound double tilde bar close BEL


joel hanes 05.22.09 at 7:18 pm

Argh. html eats special characters, so the poem loses critical bits.

The original is archived here.

Lack of preview considered harmful.


Detlef 05.22.09 at 9:43 pm

Peter wrote:

“Still, I imagine if you find the “” key, above it you’ll see a (possibly broken) vertical line. That will come out as “|” when you type shift-.
Not exactly impossible to type. It’s a nearly completely-useless search engine though :)”

Just for the sake of Europeans here. :)
Because I looked for it quite some minutes.
On this European keyboard (in Germany) you´ll find it down left. The “<” key.
Use “Alt Gr” with the “<” key and it´ll produce a “|”.


ignalina 05.22.09 at 11:14 pm

Or Alt + 7 on a Swedish Mac keyboard.


K Ackermann 05.23.09 at 6:34 pm

It was a different Isaac Asimov that wrote some of the Isaac Asimov stories. Let’s be clear on this.


Nell 05.23.09 at 7:00 pm

Anyone else willing to cop to being a BtVS/Angle fan and unintentionally substituting Wolfram & Hart for every Wolfram|Alpha reference – and having the post and discussion still make sense?

