Talks and Poster Presentations (with Proceedings-Entry):
M. Zlabinger, A. Hanbury:
"Finding duplicate images in biology papers";
Poster: Symposium on Applied Computing (SAC),
Marokko;
2017-04-04
- 2017-04-06; in: "32nd ACM SIGAPP Symposium On Applied Computing",
SAC '17 Proceedings of the Symposium on Applied Computing,
(2017),
957
- 959.
English abstract:
Duplicated images in biology papers are a possible indicator for plagiarism or data fabrication. A manual detection of such duplicates can be time consuming or even infeasible for huge image collections. In this paper, a semi-automatic duplicate detection approach is proposed. The approach can be used for the detection of duplicates that cover only a fraction of the full image, are transformed (e.g. rotation), occur between images or within single images (i.e. single-image-duplicates). In the proposed approach, single-image-duplicates are detected between sub-images (i.e. sub-figures) based on a connected component approach and duplicates between images are detected via the min-hashing technique. The approach was evaluated on 1.7 million images extracted from biology papers. By application of various filtering methods to remove false positive detections, only a small amount of manual effort was necessary to find 3041 potentially serious duplicates in so far non-retracted papers.
Keywords:
duplicate images machine learning
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
http://dx.doi.org/10.1145/3019612.3019875
Electronic version of the publication:
http://publik.tuwien.ac.at/files/publik_264250.pdf
Created from the Publication Database of the Vienna University of Technology.