Moving Faster and Further in Corona Research Using Open Data

When the global COVID-19 pandemic was announced, scientists and organizations worldwide responded swiftly by volunteering research time and data. Some are studying the spread of the disease, others are studying the implementation of policies, and again others are studying psychosocial factors, like the consequences of social distancing, or policy adherence. This community response is a clear demonstration that Open Science practices can accelerate scientific progress.

Many of these research projects have an international scope; differences between countries can help us understand which policies are effective at halting the spread of the disease, and mitigating the psychological impact on citizens.

My team at Data versus Corona set out to facilitate the cross-national psychological study “PsyCorona”. Our goals were to:

Identify differences between countries that might be relevant for psychosocial responses to the pandemic and adherence to government policy, and find relevant data sources.
Make these data sources interoperable, so that each can be linked to a country code.
Use machine learning algorithms to perform variable selection and identify the main predictors for different outcomes.

While addressing the first two points, we quickly realized that the data we were curating could prove valuable for many different research projects, both relevant to COVID-19 and not. Moreover, we expected that some projects might lack the expertise required to extract data available elsewhere in a usable format. Thus, we decided to share our ongoing progress, to invite both data users and contributors, and to follow best-practices for data sharing.

We started out with a solid foundation: A fully reproducible project template using my WORCS – a Workflow for Open Reproducible Code in Science (https://osf.io/zcvbs). From there on, we set out to curate relevant data sources. For every data source, we wrote a function to access the data and homogenize it. This makes it possible to conduct automatic data queries every day, and update the static files for analysis.

Thanks to this interoperability, we are currently working with two additional research projects using these data. Moreover, as our database is indexed on several GitHub lists, there might be others out there using them without us even knowing.

The project is, as of yet, ongoing, but it is exciting to see how Open Science plays into every aspect of it, from conception to execution, collaboration, and deliverable output. Key insights include the importance of making everything reproducible and interoperable, linking the licenses of the primary data sources we index. Finally, working with a diverse team of data scientists is an excellent learning experience, as everyone brings their own expertise to the table.

Here is the repository in its current form; if you are looking to use these data for research – go ahead, and do reach out if you need support in using it!