Self-plagiarism

by John Q on November 16, 2005

In the Media and Culture journal M/C, Lelia Green has an interesting piece on self-plagiarism, linking referring to a site called Splat which asserts

Self-plagiarism occurs when an author reuses portions of their previous writings in subsequent research papers. Occasionally, the derived paper is simply a re-titled and reformatted version of the original one, but more frequently it is assembled from bits and pieces of previous work.
It is our belief that self-plagiarism is detrimental to scientific progress and bad for our academic community. Flooding conferences and journals with near-identical papers makes searching for information relevant to a particular topic harder than it has to be. It also rewards those authors who are able to break down their results into overlapping least-publishable-units over those who publish each result only once. Finally, whenever a self-plagiarized paper is allowed to be published, another, more deserving paper, is not.

Splat also refers to

textual self-plagiarism by cryptomnesia (reusing ones own previously published text while unaware of its existence)

(I know all about this).

Green takes a more nuanced view and has some interesting discussion.

I’m surprised by the fact that self-plagiarism hasn’t been addressed before. I’ve seen quite a few cases where the same author has two papers that differ by one global Find and Replace, plus a corresponding adjustment in the notation.

At the same time, I don’t think this issue can be understood simply in terms of matching blocks of text. If, for example, Professor X writes ten papers on Problem Y, the summary of the literature and the description of the problem are going to be pretty much the same each time, even if there’s a substantial new contribution in each paper. Insisting that these pieces of necessary boilerplate be rewritten for each new paper seems rather pointless, and the alternative of citing or quoting the first paper for such material is silly.

In any case, there are worse sins along these lines than (partial) self-repetition. The biggest problem is the analog of “PhD variation”, papers which derive the consequences of marginal changes in a model the author has already analysed to the point where it can deliver no new insights.

The other problem with the Splat analysis is that it’s very much in the old world where everything that matters is in journal articles. Increasingly, though, important ideas are going to be aired first in newer media like blogs, before being refined into journal articles.

{ 32 comments }

1 Z 11.16.05 at 6:28 am: I think the issue of splat also depends on the subject. In mathematics, my own field of research, splat is very common and not neccessarily negative. It helps the reader to understand how the very same mechanism is workimg in a slightly expanded context. From splat to splat, some mathematician build truly amazing articles (Ralph Greenberg, and his series of article on Iwasawa theory, comes to mind)
2 Dan K 11.16.05 at 7:02 am: Actually, splat is unavoidable, given the way academic papers generally are written. First, we have John’s comment that there is always boilerplate sections (lit review, methodology in empirical papers, portions of case descriptions). Second, articles always exist in previous semi-drafts, such as conference papers and key-notes, that normally are publicly available. I think the problem here is basically a too anal view on the text. It doesn’t matter that boilerplate portions of a text is recycled, as long as they are connected to original ideas.
3 Kieran Healy 11.16.05 at 8:07 am: The papers covering the development of the SPLaT tool itself might provide an interesting case study. If it continues to develop, won’t the authors have to write papers talking about newer features (or responding to arguments) in such a way that earlier stuff is at least partly recapitulated? It’s not reasonable to ask readers to dig up your whole body of work, or some very large subset of it.
4 Kieran Healy 11.16.05 at 8:09 am: A different way of putting this point is that disciplines are in many ways like ongoing conversations, and there can be a lot of redundancy in conversations.
5 John Emerson 11.16.05 at 8:25 am: While I think that there are innocuous and even beneficial versions of this sort of thing, some of it is an attempt to game the publication-evaluation system. In any case I have known of people who could get five publication units out of one piece: a couple of preliminary versions, a journal article, a chapter in an anthology, and a chapter in a book. This might just barely happen innocently, I suppose, but in the case I’m thinking of the perp knew exactly how many points each unit of publication got her.
6 eudoxis 11.16.05 at 8:27 am: Don’t most journals have guidelines for resubmissions and prior publications? It’s astounding how much of this takes place in medical research.
7 eudoxis 11.16.05 at 8:27 am: multiple submissions
8 Dan K 11.16.05 at 9:01 am: Well, I guess that John Emerson’s point is a main reason why only publications in journal articles tend to be given serious weight. I presume that most academics would think that the chapter in the anthology, and the book, is either more or less loosely based on a published article, or deemed too lightweight to be published in a journal. This logic is self-reinforcing, too.
9 eudoxis 11.16.05 at 9:05 am: The trial balloon already has an informal format as abstracts and posters at meetings. They are usally followed by a paper.
10 dsquared 11.16.05 at 9:10 am: hey, have you seen that Guinness ad?
11 Clueless 11.16.05 at 9:37 am: On a related note:

AN ANALYTIC STUDY OF THE LEAST PUBLISHABLE INCREMENT

Abstract

This paper presents an analytic study of the least publishable increment (LPI). The LPI is defined as the smallest acceptable difference between two publishable papers. Two metrics for the LPI are derived. The first metric is based on a generalized distance measure derived from the Hausdorff metric and is used to differentiate between papers on similar topics by different authors. The second metric describes a distance measure for papers from the same author.

Further studies using cross-journal and conference proceedings relations are also discussed. We outline a simple strategy for maximal publication based on these distance measures. An illustrative example of the maximal publication scheme is shown and its correlation to actual publications is also given.

We present a proof that maximal publication based on the LPI is an optimal approach for junior faculty members attempting to get tenure.

Source: rec.humor.funny (RHF)
12 Tom Hurka 11.16.05 at 11:08 am: The answer may vary from discipline, but isn’t a relevant question how many of the people who should have read a paper when it first appeared, e.g. because they’re working on the same topic, did read it? In an ideal world the answer would be “all of them.” In the real world (except maybe in some hard sciences?) it’s usually “less than all” and in my discipline of philosophy it’s often “a lot less than all.” Even if a paper appears in the most prominent possible journal many people writing on the same subject won’t have read it. So re-presenting its main idea along with some new material in a follow-up paper increases the number of people exposed to the idea, not by 100% perhaps but potentially by quite a lot. Ideal writers might never repeat themselves, but then ideal readers would never benefit from repetition. And in the real world many disciplines don’t have ideal readers.
13 sennoma 11.16.05 at 11:11 am: I don’t think that splat, at least in my field, is especially problematic, for reasons noted above — much of it consists of necessary boilerplate, conversational redundancy and less-weighted publications (a la dan k #8). (Pace eudoxis? I’m in medical research too.)

I think the larger problem is the LPI, which I call the MPU (min publishable unit) and which gives rise, via salami publishing, to a good deal of splat. IOW, I’d call splat a symptom of salami publishing (slicing a body of work into publications as thin as possible).
14 paul 11.16.05 at 4:16 pm: Clueless’s post above (9:37 am) reminds me that years ago, probably while we were still in grad school, a friend mentioned that the optimal publication strategy was generally believed to be maximizing the number of papers per idea. He was going to try the dual of this strategy, minimizing the number of papers per idea. In particular he was aiming to overcome the apparent minimum of 1 (idea per paper).
15 paul 11.16.05 at 4:19 pm: JQ wrote:

Iâ€™ve seen quite a few cases where the same author has two papers that differ by one global Find and Replace, plus a corresponding adjustment in the notation.

Many years ago, when I took trade theory, my first thought was, this looks just like I/O, with the word ‘firms’ replaced by the work ‘countries’. I suppose this is what z means in the first post in this thread.
16 CalDem 11.16.05 at 4:50 pm: I wrestle with this. One problem not mentioned is if you tend to rewrite similar material for very different audiences. I am often writing articles from the same set of original findings for both technical and non-technical audiences. So a lot of the data description and findings are close to identical but the intro, explanation, and the decsription of methodologies varies a lot. I’ve decided to copy some material verbatim because I like the way I wrote it the first time and think rewriting it would make it less clear.
17 Paul Gowder 11.16.05 at 4:52 pm: Ok, here’s the question. Why does the academic community use this term, “self-plagiarism?” What is the value that this label is supposed to serve? After all, it’s not real plagiarism, in the sense of stealing ideas, because, well, the author is entitled to steal his/her own damn ideas. Nobody is being defrauded (unless it’s being done by a student, in a class, for a grade). It seems that the objections quoted above are … rather di minimis, and could easily be levied against a host of other work that doesn’t get that oh-so-condemnatory word “plagiarism” attached. Filling journals with unoriginal crap? Well, gosh. If “self-plagiarism” were stamped out, that would be solved, right?

Calling “self-plagiarism” a form of plagiarism seems to trivialize real plagiarism, which is already far too trivialized as it is.
18 Matt Daws 11.16.05 at 6:16 pm: Slightly off-topic, but here’s a vote for keeping “boilerplate” stuff firmly in papers. I’m also a mathematician (as #1), and my current pet hate is authors who do one of the following:

i) Write “It is well-known that…” as an excuse not to give a reference. Well, it’s not well-known to me, now that I’m reading your paper 20 years later. If it’s really “well-known” then surely there must be a good reference?

ii) Write “The following has a short proof, see” and then gives a reference to a paper in Russian in some obscure Japanese journal (okay, I exagerate). If it’s a short proof, and the best reference is really obscure, then would it really hurt to give the proof (obviously noting that it’s not original)?

I guess it comes down to if you view papers as giving very technial, incremental additions of knowledge (in which case maybe background isn’t needed) or if papers should be more self-contained, widely read pieces of work. I tend towards the latter view: many papers in maths are more important (IMHO) for the techniques they demonstrate than the actual results they prove: anything to improve readability, and to give a guide as to where this work is coming from, can only be a good thing.

I do wonder, however, if journals perhaps don’t help in their quest for short papers. I feel a bit of pressure to put a single solid result in a paper and send it off, even if I might really have a result and a half. This keeps the paper shorter, and the hope is that I can work the half-a-result into a full result, and publish it later…
19 Anthony 11.16.05 at 6:34 pm: Increasingly, though, important ideas are going to be aired first in newer media like blogs, before being refined into journal articles.

Not while Impact factors hold sway they won’t.
20 david tiley 11.16.05 at 8:44 pm: From memory, Jack Higgins was caught reusing chapters from his own work again in subsequent novels. It only came up because he changed publishers and the old ones pointed out they owned the rights to the new chapters.

Now that is genuine self-plagiarism because it is deliberately deceptive, and passes something off to the reader.

In the cases which you find distasteful, the academics here are gaming the system, but it is fraud rather than plagiarism – trying to create a false CV with inflated publications.

Mind you, it could also just be narcissism.

I think this is the reference for the HIggins stuff, though its not on the net and I’m not at a library: Steinhauer, Yvette, Jack Higgins: If Heâ€™s Said It Once, Heâ€™s Said It a Thousand Timesâ€™, Age Good Weekend, 24 June 1988, 51â€“55.
21 sennoma 11.16.05 at 9:37 pm: Not while Impact factors hold sway they wonâ€™t.

Probably true, but as open access gains traction and online archiving/searching keeps improving, I think impact factors will lose a lot of their power.
22 ArC 11.17.05 at 1:55 am: The Guinness ad affair is one of my favourite running jokes here, or probably anywhere on blogs.
23 Kenny Easwaran 11.17.05 at 3:08 am: I’ve been surprised recently reading some important books in philosophy of mathematics from the last ten years or so, just what large sections of these books were once papers by the same author, with few changes. However, in many cases this is a good thing – it’s useful to have separate articles that pursue each point independently, and it’s also useful to have a larger unit that ties them together and gives you a good idea of what the author was thinking about in that time span.

But in response to Paul Gowder in #17, I think that the idea is that both splat and “real” plagiarism harm the discipline almost equally. In both cases, work is duplicated and journal space is taken up that could go to other interesting ideas. In the case of “real” plagiarism, there’s the extra issue of credit and career building, while with self-plagiarism there’s at least no victim of that sort. But if the general ethical imperative of academic publishing is to let the world know about new ideas that you have, then both forms of plagiarism are violations.
24 agm 11.17.05 at 4:58 am: I think that a new write up of the same work, comingled with new work, makes it a sufficiently nuanced issue, that it isn’t necessarily plagiarism in many of the cases being discussed. In my field, “in Russian, in a Soviet journal” is a real possibility. If Landau did something, and I would find that result terribly useful, but no one’s translated it, it might just be that it got rehashed as part of another paper, which did get translated and saves my bacon. Very quickly it becomes very problematic to claim that every accomplishment should have its 15 minutes of journal time once and only once, when it might be needed 20 or 100 years later, in a different language, by people who are not initiates into the lingua franca that prevailed at the time of original publication. To me, this last is actually the best defense against charges of self-plagiarism. On a road trip this summer I learned that my boss, who got her PhD 30-odd years ago, doesn’t seem to have internalized that fluid dynamics is no longer a part of physics — it was left to the engineers and other non-physicists well before I was born. She and her contemporaries often seem to forget that we don’t necessarily know the things they knew at the same point in their education. (On the other hand, she knows crap all about nanotech, so I’ll call it square.)

What we need is a better definition of self-plagiarism, but this is very difficult to do because it requires some tough calls with not enough information about what others, let alone posterity, will eventually find useful. Though, given that eventually all journals will be online for the sake of space, delay time, and expense, this will become less of an issue.

Plus, it’s like saying that you only get to do one version of the Aristocrats…
25 kaw 11.17.05 at 3:38 pm: I self-plagiarize all the time.
26 kaw 11.17.05 at 3:38 pm: I often self plagiarize.
27 kaw 11.17.05 at 3:39 pm: Often, I self-plagiarize.

(ad nauseum.)
28 jake 11.17.05 at 3:53 pm: I have to admit I have occasionally found splat to be a pleasant surprise–I was thinking I would have to read a whole paper, and as it turns out, only the last half-page of it contains any new information at all.

As regards the exposure comments, I have the impression that in physics now, journal publishing is nearly an afterthought with respect to actual work–needed for promotion decisions, but for nothing else. Everything important gets communicated via preprints. I’m not a physicist, though–this is just what I heard.
29 Zephania 11.17.05 at 5:04 pm: One open source model is … publisher pays.

Is this a strong argument for this model?
30 Ray Davis 11.18.05 at 2:26 pm: After the distribution-to-cost ratio and its sheer immediacy, this is the main reason I moved to web publishing and one of the main reasons I’d like to see more academics move to web publishing. I write short pieces that are all somewhat connected but I loathe repeating myself. Behold the power of the URL! Don’t copy, somewhat revise, and re-publish a previous proof, a previous study, or the first third of your think piece on Donald Rumsfeld’s relationship with Sammy Davis Jr. — just link to it and move on.
31 Ray Davis 11.18.05 at 2:39 pm: Follow-up: Lelia Green points out that self-citation defeats the semi-fiction of blind peer review and raises the possibility of Google-bombing (or its academic equivalent). However, full-text journal search for a “hmm, that sounds familiar” passage would unveil the author just about as easily as an explicit link, and I believe citation index rankings are already getting smart about weighting self-citations lower, much as Google pays less attention to self-linking.
32 sfrefugee 11.19.05 at 2:13 pm: BS.

There is no such thing as splat. If it is your writing and your ideas – re-furbrishing, re-publishing, massaging, changing etc. are your right for having done the original work/idea/data collection in the first place. See, e.g. mickey mouse.

Comments on this entry are closed.

Self-plagiarism

Recent Comments

Search

Archives

Pages

Book Events

Contributors

Fine Print

Lumber Room

Old Wood

Meta

Recent Posts

Tags