Computers will be fighting plagiarism on ArXiv March 12, 2007

Posted by apetrov in Near Physics, Physics, Science.

I was recently pointed to an interesting paper in Physics Today “Experimenting with plagiarism detection on the ArXiv” by Toni Feder (thank you, Andrei Sidorenko, for pointing it out). Let me shamelessly copy the first paragraph of that article: “Starting this summer, submissions to the arXiv, the online server where many physicists check daily for new preprints, will be compared with the server’s existing 400 000—and counting—manuscripts to check for plagiarism.” Apparently, this news was already discussed a couple of years ago in an interesting article in one of the old issues of The Chronicles of Higher Education (sorry, you might need a subscription).

It is an interesting development. According to this paper, Paul Ginsparg (creator of the ArXiv) and his graduate student Daria Sorokina did a study which found that about 10% of all papers on arxiv has blocks of overlapping text! Indeed, this should not be surprizing, as many authors reuse their own papers when they write conference proceedings. But then, excluding those, they still found about 1% of papers that were clear copies of other authors’ work! Once again, it’s 1% of 400 000 papers — you can do the math (that is, unless you work in a Honda Dealership in Farmigton Hills or, as I recently learned, at Home Depot)!

I think this kind of software is really needed. I recall a scandalous case of one fellow who submitted a bunch of papers to the arxiv which were just copies of various papers he found on the same arxive. In particular, he copied several chapters of the BaBar Physics Book. I wrote one of those chapters! 

Nowdays, you can even fight self-plagiarism! But then again, it should be easy to remember what is published and what is not… Although I can recall at least two instances when this happened: I was refereeing a paper submitted to Physics Letters B when I noticed that almonst all text of that paper was exactly the same as the text of previous paper written by the same author! Well, the formulas were different — maybe that’s what counts — as the author talked about a different meson system. But all the text was the same!!! Then the same happened when I reviwed another paper for Physical Review D. Apparently, some people just want to improve their paper count… 


