Dash logoUC Merced logo

PLOS ONE publication and citation data


Petersen, Alexander (2018), PLOS ONE publication and citation data, UC Merced Dash, Dataset, https://doi.org/10.6071/M39W8V


Merged PLOS ONE publication metadata and Web of Science citation data, compiled in .dta files produced by STATA13. Included is a Do-file for reproducing the regression model estimates reported in Tables I and II reported in ("Megajournal Mismanagement", Petersen, 2018). Each observation (.dta line) corresponds to a given PLOS ONE article, with various article-level and editor-level characteristics used as explanatory and control variables. This summary provides a brief description of each variable and its data source.


We gathered the citation information for all PLOS ONE articles, indexed by A, from the Web of Science (WOS) Core Collection. From this data we obtained a master list of the unique digital object identifier, DOIA and the number of citations, cA, at the time of the data download (census) date on December 3, 2016. We then used each DOIA to access the corresponding online XML version of each article at PLOS ONE by visiting the unique web address “http://journals.plos.org/plosone/article?id=” + “DOIA”. After parsing the full-text XML (primarily the author byline data and reference list), we merged the PLOS ONE publication information and WOS citation data by matching on DOIA.

#allofplos: PLOS has since made all full-text XML data freely available: https://www.plos.org/text-and-data-mining ; this option was not available at the moment of our data collection.

Usage Notes

Data description provided in the enclosed UC-DASH_DataDescription_Petersen.pdf