Self-plagiarising myself on self-plagiarism

by John Q on July 10, 2008

After reading this piece on self-plagiarism in the Times Higher Ed Supplement, I couldn’t think of any better response than to reprint verbatim this piece from 2005 (now with a new improved 2008 publication date), including a self-link to a piece which is simultaneously self-referential and self-plagiarising.

It’s over the fold:

In the Media and Culture journal M/C, Lelia Green has an interesting piece on self-plagiarism, linking referring to a site called Splat which asserts

Self-plagiarism occurs when an author reuses portions of their previous writings in subsequent research papers. Occasionally, the derived paper is simply a re-titled and reformatted version of the original one, but more frequently it is assembled from bits and pieces of previous work. It is our belief that self-plagiarism is detrimental to scientific progress and bad for our academic community. Flooding conferences and journals with near-identical papers makes searching for information relevant to a particular topic harder than it has to be. It also rewards those authors who are able to break down their results into overlapping least-publishable-units over those who publish each result only once. Finally, whenever a self-plagiarized paper is allowed to be published, another, more deserving paper, is not.

Splat also refers to

textual self-plagiarism by cryptomnesia (reusing ones own previously published text while unaware of its existence)

(I know all about this) Green takes a more nuanced view and has some interesting discussion.

I’m surprised by the fact* that self-plagiarism hasn’t been addressed before. I’ve seen quite a few cases where the same author has two papers that differ by one global Find and Replace, plus a corresponding adjustment in the notation.

At the same time, I don’t think this issue can be understood simply in terms of matching blocks of text. If, for example, Professor X writes ten papers on Problem Y, the summary of the literature and the description of the problem are going to be pretty much the same each time, even if there’s a substantial new contribution in each paper. Insisting that these pieces of necessary boilerplate be rewritten for each new paper seems rather pointless, and the alternative of citing or quoting the first paper for such material is silly.

In any case, there are worse sins along these lines than (partial) self-repetition. The biggest problem is the analog of “PhD variation”, papers which derive the consequences of marginal changes in a model the author has already analysed to the point where it can deliver no new insights.

Actually not a fact, as I could easily have checked. Here’s Pamela Samuelson in 1994 (behind ACM paywall unfortunately).

{ 14 comments }

1 Neil 07.10.08 at 11:41 pm: Self-plagiarisation is a practice in which I engage regularly. It’s called having a research program. In paper X, I establish (to my satisfaction) that p is the case. In paper Y, I repeat verbatim not only some of the boiler plate John refers to, but also a paragraph or two summarising my argument for p before I go on to establish (to my own satisfaction) that given p, q follows. If q is a substantial and disputable point (even given p), then I am engaged in genuine research in writing paper Y. The fact that large slabs of text appear verbatim from my own published work is irrelevant.
2 Adam 07.10.08 at 11:56 pm: I think this debate has less to do with questions of authorship and original contributions to the literature and more to do with copyright and the imperatives of the marketplace.

The following stinks of sour grapes:
“It also rewards those authors who are able to break down their results into overlapping least-publishable-units over those who publish each result only once.”

If the journal reviewers feel the contribution was original and sufficient then the work deserves to be published. Nobody appointed Collberg and Kobourov high poobahs of scientific merit. Also — wanna bet there is overlap in the technical report and paper listed on the website?
3 david 07.11.08 at 12:43 am: The issue that really needs to be addressed here is *why* people self-plagiarize. I myself don’t see anything wrong with it, but I think I have an idea as to why individuals may take the mentioned “least publishable units” approach.

When I was in college, my intermediate macroeconomics professor was up for tenure. Not that great of a professor, but not too many of my econ classes could tout that. At any rate, he was a world class publisher. He had four articles out in leading economic journals. He didn’t get tenure because the department wanted more articles than he had published. Where’s the incentive to write high-quality comprehensive examinations of a subject when the reward schedule favors quantity over quality?

So the solution should be thus: departments should favor not necessarily fewer, but higher-quality (ie non-self plagiarizing or repetitive) works by professors. There’s obviously more to the subject that I’m overlooking for the sake of simplicity, but I think this is comprehensive to what is at hand.
4 vivian 07.11.08 at 3:10 am: Tempest? teapot? Are we supposed to present conference papers or get them published? Different journals have different audiences (and sizes). Theory articles tend to cover a lot of the same ground on the way to somewhat different conclusions. I don’t know many academics who pay cash for any journals. If you get a free article, and discover that for your purposes it’s the same as the one last year, stop reading and move on. Someone closer to that topic probably won’t. Journal space is about as scarce as gemstone-quality diamonds. Blaming prolific colleagues is just more distraction from culprits the lack of good tenure-track jobs.
5 bryan 07.11.08 at 9:23 am: â€œIt also rewards those authors who are able to break down their results into overlapping least-publishable-units over those who publish each result only once.â€

yes. If this is self-plagiarism I guess most professional writers of non-fiction do the same thing. It is behavior that is economically motivated in that case, but probably the same argument can be made for academic research with its publishing requirements leading to a similar motivation.
6 Dave 07.11.08 at 1:07 pm: ‘Self-plagiarism’, if it is to mean anything, can only mean that one fails to make reference to previous publications while repeating chunks of them verbatim. Merely being repetitive in one’s arguments or concepts, or circling round the same point from umpteen different directions, is just being dull and desperate to publish – jolly bad form, but hardly an offence.

Simultaneous publication of similar articles tailored to different outlets is, meanwhile, quite different. If one is, say, presenting three case-studies built around the same conceptual frame, one is necessarily going to have to explain that frame, in basically similar ways, in each of the three pieces. Unless there are those who would argue that one should deliberately refrain from seeking to publish more than one such study?
7 trey 07.11.08 at 1:43 pm: With respect to neil’s comment, I definitely notice this within research programs more than anything. When one is reading up on a particular topic and an author is prolific within that topic, one will begin to notice many sentences and paragraphs copied verbatim, though often it is in different media; a journal article, an invited chapter, the introduction to an edited volume, a monograph covering the individual’s research program. I find it irritating, but understandable.
8 anon 07.11.08 at 9:11 pm: When I was in graduate school for political science, I was told, explicitly and repeatedly, that the acceptable amount of material to reuse was 50%. You are expected to cite yourself, of course, but up to 50% of an article’s text can have appeared in a prior publication and still be a legitimately new article. The logic I was taught was exactly along the lines of Neil and Dave’s points.

(They may have been sometimes cold, smug, and harsh, but that was one program that was very thorough about induction into the profession.)

Conference papers are a whole other story – an article could be nothing but a reworked conference paper. That’s exactly what conferences are for.

So I don’t know what these authors mean by clearer rules, because I got very clear rules; they were just rules these particular people don’t happen to like. If journal editors don’t like them, perhaps they should establish specific policies in their submission guidelines. But as Adam said, who appointed Collberg and Kobourov high poobahs of scientific merit?
9 Anonymous 07.13.08 at 3:50 pm: So I donâ€™t know what these authors mean by clearer rules, because I got very clear rules; they were just rules these particular people donâ€™t happen to like.

One of the main issues seems to be how much the standards vary from field to field. I work in pure mathematics, where the standard appears to be 0% reuse. I’ve never discussed it with an authority figure, but I never imagined any substantial degree of reuse was acceptable (and I’ve confirmed this feeling with several colleagues). It’s probably fine to copy a few sentences from a previous paper, but copying numerous paragraphs may open you to charges of unethical behavior, even if you cite the paper they appeared in. If you tried to publish a paper consisting of 50% previously published text, it would actually derail your career. (The one exception I know of is for material not previously published in a journal but rather in conferences in certain fields, like theoretical computer science. There, it can be reused in journal papers, but only if they are clearly labeled as revised versions of the conference papers.)

So when I saw comments like #1, I assumed the author just had really low ethical standards. It wasn’t until comment #8 that I realized there might actually be differences in the publicly articulated standards in different fields. It’s intriguing.

Do people in other fields get guidance like this in graduate school? If so, what’s considered acceptable?
10 John Quiggin 07.13.08 at 8:51 pm: “I work in pure mathematics, where the standard appears to be 0% reuse.”

So how do you deal with (previously-published) axioms, definitions and so on? Omit them, reformulate them each time, or cite them to an earlier paper?
11 Anonymous Mathematician 07.14.08 at 3:11 am: (I’m the same commenter as in comment #9.)

So how do you deal with (previously-published) axioms, definitions and so on? Omit them, reformulate them each time, or cite them to an earlier paper?

If they’re simple, one just restates them on the fly. (They probably come out pretty similarly worded, and perhaps identically. In any case, the important thing is the meaning rather than the wording.) If they’re complicated, I don’t think people would object to copying a definition or theorem statement from another paper, assuming the original source is of course clearly cited.

Theorems and definitions are usually pretty short, though, typically at most a few sentences. My feeling is that copying multiple paragraphs would be really frowned upon (although you could probably get away with it), and copying multiple pages would get you in serious trouble with the journal editor.
12 Neil 07.15.08 at 2:52 am: Anonymous mathematician,

I’m the guy you thought had low ethical standards. To be fair, you changed that view when you realized that different fields have different standards. But now I want to defend those standards as ethical. Suppose you are a scientist and you are developing a hypothesis. To be concrete, let’s talk about psychology. Suppose your hypothesis is that brain modules got more highly interconneted over evolutionary time. You write a paper defending this view, citing neuroanatomical and functional evidence. Then it occurs to you that you can use this thesis to develop a view with regard to human creativity. You write that paper. Now do you really think it is – at best – permissible for you to reuse the conclusions and quick summary of your view in paper 1 in paper 2? Any field that requires you to rewrite those conclusions in new words is a field that requires you to waste your time. Any field that requires you to redefend the view (of course in new words) is not only wasting time, it is taxing readers’ patience beyond endurance. Since both of these practices impede the growth of knowledge, they are unethical.
13 Anonymous Mathematician 07.16.08 at 2:17 am: Sorry for the comment about ethical standards; I should have written more carefully. I guess I meant that at first I thought you worked in an area with the same tacit standards and were flagrantly ignoring them, but then I realized you certainly weren’t.

I agree that there are some advantages to reusing text. On the other hand, there may be advantages to a policy against it as well. One observation is that rewriting some background sections is a very small task. (For me, at least, proving the theorems takes perhaps twenty times as long as writing them up. Rewriting 5 or 10% of a research paper will make only a tiny difference in overall productivity, especially since the background and introduction are by far the fastest parts to write.) So a correspondingly small benefit could make it worthwhile.

One potential benefit is that a newly rewritten introduction may better reflect the very latest views of the author and may be more closely tailored to the rest of the paper, compared to a single piece of boilerplate written years ago and then repeatedly reused. Another is that it encourages authors to use different examples and explanations. There’s no requirement to do so, but if you are writing several brief introductions, you’re likely to include a little more material in total than you would ever have put into a single introduction.

It doesn’t seem completely clear which effects are most important in practice.

From my perspective, the only real ethical issue is in compliance with community standards. If your colleagues expect each paper to be written from scratch, then secretly reusing text is a mild form of misrepresentation. On the other hand, if everyone agrees that reusing text is fine, then there’s of course no ethical issue.

Incidentally, what do you think of copying other people’s backgrounds or introductions? In the last year or two, there was a big scandal about this in physics. It particularly comes up for non-native speakers of English, who sometimes argue that writing background material is a real burden for them, and that if someone has already done a good job of it, what’s the harm in adapting it rather than starting from scratch? Of course, I think most people would consider it unacceptable plagiarism if it is done without explicit notice. However, should people be allowed to write something like “Most of the following section is taken verbatim from the excellent introduction to [1].”? Do you know of any fields where this is explicitly allowed? That would be a really interesting data point.
14 Neil 07.16.08 at 2:38 am: AM,

I agree that if there is a reason to rewrite – if the field has changed, if there is data that better demonstrates the point, or if the author has changed their views, even slightly – then rewriting is called for. My claim is that if it is simply rewriting to satisfy a non-self-plagiarism condition, it is a waste of time; its a standard that doesn’t earn its keep.

I guess I have no real problem with the copying of other people’s introductions either, with proper citation. So far as I can see, though it is not widespread, I doubt that most people in the social sciences would have a problem with this.

Comments on this entry are closed.

Self-plagiarising myself on self-plagiarism

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags