Dash logoUC Merced logo

Large dataset of disambiguated publication profiles for studying researcher mobility


Petersen, Alexander (2019), Large dataset of disambiguated publication profiles for studying researcher mobility, UC Merced Dash, Dataset, https://doi.org/10.6071/M3VM1D


Researcher mobility facilitates the  exchange of scientific, institutional, and cultural knowledge. Yet whether  globalization  and advances in virtual communication technologies have altered the impact of researcher mobility is a relevant and open question. To this end, we developed and leveraged a large and internationally broad disambiguated dataset of researchers in physics over the period 1985-2009; we used the article-level metadata contained in the American Physical Society "Article and Citations dataset" as the starting point. After clustering research articles into disambiguated research profiles, we then focus on the 10-year period centered around each mobility event attributed to an individual researcher in order to  assess the impact of mobility on research outcomes. We account for secular globalization trends by splitting the analysis into three non-overlapping periods, calculating within each period a set of scalar measures representing researchers' citation impact, research topic diversity, collaboration networks, and geographic coordination. Herein we document the disambiguated researcher profile data used in the accompanying research article. For each time period we provide  two versions of the parsed data that differ  in their level of aggregation: the disaggregated publication data contains 84,324 total profiles and the aggregated data used in the propensity score matching analysis contains 26,943 total profiles.

If you use these data, please cite:

A. M. Petersen, Multiscale Impact of Researcher Mobility. Journal of the Royal Society Interface 15, 20180580 (2018). DOI:10.1098/rsif.2018.0580


Raw APS data: We analyzed the 2009 American Physical Society (APS) Physical Review article and citations dataset, which is freely available by submitting a request at: https://journals.aps.org/datasets.  After parsing the XML data files provided by the APS, we then extracted researcher profiles by applying a network-based author name disambiguation method developed in:

Shulz et al., Exploiting citation networks for large-scale author name disambiguation. EPJ Data Science, 2014; DOI:10.1140/epjds/s13688-014-0011-3

Researcher data: Application of the author name disambiguation algorithm identified 208,734 distinct researcher profiles over the 30-year period 1980-2009. We parsed the data into CSV data files for mobility events occuring in three non-overlapping periods (each corresponding to a separate data file): T1=1990-1997; T2=1998-2003; T3=2004-2007.  The data files collect the article-level metadata for the 5-year period before and after the mobility event, concentrating on the ~26,000 researchers with Ni ≥ 10 publications who also meet additional career longevity and productivity criteria. See the ReadMe file for additional details. 

Usage Notes

ReadMe file: The enclosed PDF file Mobility_DataDescription.pdf describes the source and organization of the researcher profile variabiles contained in each CSV file.



SW -90.0, -180.0
NW 90.0, 180.0