Open Science Innovation Award

To promote awareness, foster exchange, and encourage collaboration in the realm of Open Science, the Institute has established an annual Open Science Innovation Award, starting in 2023.

The award will recognize and celebrate innovative ideas to enhance the accessibility, transparency, and reproducibility of the Institute’s research. Each year, two award winners (one individual and one team) will be invited to present their work in a designated event.

Here are this year's winner:

Aaron Peikert - Reproducible Research in R: A Workshop on How to Do the Same Thing More Than Once

Further team members:
Hannes Diemerling
Andreas M. Brandmaier
Maximilian S. Ernst

Over the past five years, we have developed and repeatedly taught a hands-on workshop on computational reproducibility, targeting researchers across a range of disciplines and career stages. The workshop introduces dynamic document generation, workflow orchestration, containerization, version control, and continuous delivery???providing participants with a state-of-the-art, end-to-end reproducible workflow. We have delivered the workshop more than 20 times, including at the DagStat Conference, Berlin Oxford Berlin | Oxford Summer School on Open and Responsible Research, LMU, Neuroscience School of Advanced Studies, Humboldt-Universit??t zu Berlin, University of Basel, SIPS, and multiple IMPRSs, reaching nearly a thousand researchers.

What makes our approach innovative is the integration research on open science, open source software development and pedagogical design: rather than treating tools as separate from research, we co-develop them alongside the workshop materials and gain insight for our research. This has enabled rapid improvements, particularly in accessibility and time-efficiency. From a two-day intensive in the early years, we now support a self-paced, modular learning format that can be completed in as little as two hours.

Our motivation stems from a simple but pervasive challenge in science: even when researchers want to make their work reproducible, the technical overhead often discourages them. After publishing a rather technical article/preprint in 2019 on how software engineering principles could drastically improve reproducibility, we received immediate interest in teaching these methods???despite having no workshop ready. Over the period of weeks, we worked on the workshop but finally we had to realize that stopping short of a full week of workshop, it is simply not possible teach the expertise we wanted to convey.

To reduce this barrier, we developed the R package repro, which automates tedious tasks (e.g., do I have ssh keys on my computer? What was the best way to install docker on a windows machine again?). Originally built to support workshop participants, repro is now used more broadly to streamline reproducible workflows in active research projects. By packaging best practices into a tool with minimal setup requirements, we help researchers spend less time troubleshooting and more time doing science.

Throughout the workshop???s evolution, several insights emerged. First, relying on researchers??? personal machines created major reproducibility issues???even within the controlled environment of a workshop. This led us to rebuild the workflow around cloud-based infrastructure and continuous integration pipelines. This has been a major step in research on reproducible workflows because it means researchers can adept and reproduce analysis code without leaving their browser, while not being bound to any particular computing infrastructure or service provider.

Second, moving online during the pandemic forced us to rethink how we deliver content. The challenge was as simple as screen real estate: participants with only one screen could either follow along or work on their code, but not both. In response, we created a fully open, self-paced version of the workshop. All materials???including videos, exercises, datasets, and ready-to-use containers???are now freely available under a CC0 license via our GitHub repo and course website. The materials are modular, reusable, and designed to work independently of us, enabling broader uptake and long-term sustainability.

By tightly integrating research, tool development, and open education, we have created a reproducibility teaching framework that is practical, scalable, and impactful. Our approach has influenced hundreds of researchers directly and continues to do so through its openly licensed materials and software.

Relevant links:
Workshop source
Workshop materials
Workshop slides
Paper inspired by workshop
Paper that inspired the workshop
Helper software

Here are this year's submissions:

Yujia Yang - the Decision from Experience Database

Further team members:
Mikhail Spektor
Anna Thoma
Ralph Hertwig
Dirk U. Wulff

Introduction:
An important way to learn about the consequences of decisions is by experiencing outcomes. Such decisions from experience are intensely investigated in various fields, including psychology, economics, and neuroscience. However, there is little cross-talk between these fields, which can be attributed in part to poorly understood differences in experimental design concerning aspects such as whether the observation of outcomes is consequential or whether outcomes are available for non-chosen options. To help integrate research across disciplines, we gathered a large number of data sets from published research and compiled the Decisions-from-Experience Database, which will be open access. Our database contains the raw data of trial-level from 180 experiments from 127 papers. The experiments include various paradigms (e.g., sampling, repeated choice, or probability learning), research methodology (e.g., behavioral and neuroscientific), and fields (e.g., psychology and economics). Our database standardizes various types of data into a common format, enhancing usability and enabling cross-paradigm analyses. While data sharing has been widely advocated in recent years, and many new studies have made significant efforts to promote data openness, accessing past research data remains a challenge. Our database addresses this gap by systematically collecting, organizing, and publicly sharing a large volume of experimental data on decisions from experience conducted before 2018. Moreover, large-scale open databases dedicated to specific research topics remain relatively scarce. By sharing our database, we aim to facilitate the integration of empirical findings and theoretical perspectives on decisions from experience, ultimately advancing research in this field.

Motivation:
The primary goal of our project is to establish a large-scale shared database on decisions from experience, serving as an exemplary practice of open research. Based on a systematic literature search, we collected and integrated data from different fields and research paradigms related to this topic, standardizing them into a reusable format. This effort breathes new life into previously overlooked data, making them accessible and valuable once again. On the one hand, the project redistributes attention to past studies and datasets, maximizing their research impact. On the other hand, it consolidates diverse studies and data under a common theme, enabling future researchers to conduct comparative analyses on a broader scale with larger datasets and much more easily. This integration facilitates analyses that would be difficult to achieve with individual experimental studies alone.

Lessons Learned:
This was my first experience working with a dataset of such scale and diversity in data formats. To process it, I had to carefully review the original publications corresponding to each dataset, along with prior correspondence with the original authors, in order to understand the research design. I worked with raw data stored in various software formats, extracting the necessary information and converting it into a standardized, comprehensible format. I also explored different methods to validate the accuracy of my conversions. This process required great patience and attention to detail. At the same time, it allowed me to quickly gain a deep understanding of the field of decisions from experience, including key research topics, common experimental and analytical methods, and prominent researchers in the area. This experience significantly accelerated my academic growth. In the meantime, the process of creating a publicly accessible database deepened my understanding of the importance and value of data sharing. It also taught me how to use proper data storage and encoding methods to ensure that future users can easily understand and efficiently utilize the dataset. Throughout the process, my supervisor provided invaluable guidance, offering clear direction and strong support, which greatly boosted my confidence.

Relevant links:
This project is ongoing. We have already built the database and are working on the manuscript. Currently, this database is a private project on GitHub. Based on our commitment to the original author, this database will be made publicly available along with the article after publication. We can add reviewers to this project as required. To give some basic information about the database, here is the introduction document of the Decision from Experience Database
.

List of last year’s winners

2024: Stefan Appelhoff - Pushing the Boundaries: Creating the Open Tools I Needed for my PhD Dissertation

In this application for the Open Science Innovation Award, I want to present my involvement with the Brain Imaging Data Structure (BIDS).

BIDS is a very popular community-developed data standard for neuroimaging data. However, while BIDS currently supports a vast array of neuroimaging modalities such as MRI, MEG, EEG, and NIRS, the situation was different when I started my PhD in 2018: EEG data was not a supported BIDS modality at that time, yet for my PhD I wanted to work with EEG data. Back then, my supervisor encouraged me to look into BIDS and see whether it could be extended to support EEG data. I thought that was a great idea, because data standards are indispensable as a foundation for good research data management. Through such standards, we can make sure that reusability, reproducibility, and research efficiency are well enabled.

Luckily, I found a work group that had just started on preparing an inclusion of the EEG modality into BIDS. I started contributing consistently in discussions surrounding the project, and was eventually invited as a co-lead of the project. We added EEG as a modality for BIDS and it has been a very popular extension (in the last 5 years, our paper describing the extension has been cited over 250 times). Ever since, I have been heavily involved with BIDS and have overseen and facilitated the inclusion of many additional modalities into BIDS in my role as a lead maintainer of the project.

However, a data standard by itself can be cumbersome: Dataset curators have to read many pages of documentation and potentially write their own code to organize their data. To support BIDS users with tools, colleagues and I have extended the popular EEG analysis software ""MNE-Python"" to work with BIDS. Our result is a software called ""MNE-BIDS"". It is meant to simplify the conversion of EEG data to the BIDS format, and to read and analyze entire BIDS datasets. We published a paper on MNE-BIDS in the Journal of Open Source Software (JOSS; over 60 times in the last 5 years). At the institute I have given workshops for both BIDS (RDM workshop in 2022) and MNE-BIDS (MPDCC workshop in 2023). All data that I collected during my PhD are organized in BIDS, well documented, and published openly.

Throughout my work on BIDS (including MNE-BIDS as a supporting software), I have learned several important lessons. I want to close this application by listing three of these lessons below:

1) Support by my supervisors to work on these ""research adjacent"" projects was crucial. I wish that in the future, more and more supervisors provide their students with the opportunity to take time off their main research projects and invest in ""public good"" open science initiatives.

2) Working at ""open science junctions"" like the BIDS and MNE-Python projects provides you with opportunities to (i) network outside your own field, and (ii) become a part of a ""greater whole"" and to obtain a sense of concrete impact that is often lacking when working on smaller-scale, incremental basic research papers.

3) Sticking with the BIDS and MNE-BIDS projects as a maintainer has proven to be very valuable: The projects remain healthy and usable, and even keep growing. And at the same time, it is very gratifying to see people use these tools on a daily basis. I strongly advocate for applying the concept of ""maintaining your projects"" to all scientific projects, way beyond their commonly accepted ""end of life"" (which is all too often the date of the paper publication).

Relevant links:
Dissertation
Proof of being a BIDS maintainer
Proof of being on the MNE-Python steering council
BIDS paper on EEG extension
Other (recent) BIDS paper of relevance
Paper on MNE-BIDS software
Slides RDM workshop on BIDS
Proof MNE-Python workshop at MPDCC
Google Scholar profile with citation counts