New rational athletics for boys and girls.

Ruben Arslan

31. Oktober 2018

Accounting for mistakes in my scientific work and announcing a bug bounty program. A story of few papers and many corrections.

Achtung: Nur zu Testzwecken. Nicht veröffentlichen.

My first published paper had a typo in the title. I had gone over every use of the word “parental” and “paternal” in my draft and asked a friend to do the same, because I knew I tended to mix them up. I forgot the title. I emailed PLoS ONE frantically when I noticed, as the paper had not yet been published, but apparently it slipped their mind, and I had to ask again after publication.

So, my academic career started with a correction. And it was not going to be the last.

Figure 1. Figure from the book New rational athletics for boys and girls (1917) in the Internet Archive Book Images

Here, we accidentally reported a lower cutoff than we actually used owing to some miscommunication to my co-authors on my part.

And recently, my co-authors and I replied to a critical commentary of one my dissertation papers. I actually love this kind of stuff, as I personally feel that I learn the most when seeing a nice debate between two parties of opposite viewpoints. ¹However, this was a fundamentally dissatisfying experience because I spent more time battling journal submission forms than counter-arguments, because the journal caused a version problem and then hardly owned up to the mistake (see Appendix).

I only noticed the version problem because an anonymous submission via my Tell me I’m wrong form alerted me to the fact that I was misquoting the authors. Thank you, anonymous commenter.

And lately, I’ve received some useful post-publication peer review of my ovulatory changes paper. I had put all my code online, and a previous peer reviewer was actually motivated to check it for errors. Others like Dan Engber or Mike Wood just noticed problems right away, by virtue of their different background in journalism and statistics. This is great, but I admittedly dread another corrections process. I would love to work in a system where we publish and update our knowledge base more continuously than through the antiquated system of publication, correction, and retraction. But I don’t and here’s an attempt to reckon with that.

Consequences

This is the story of the mistakes I know about so far. It is not an impressively low rate for a fledgling scientist, but I’m at my limit in terms of reducing my own error rate through best practices on my own. ²

I also know that for every new release of formr.org that involves more than a few lines of code, I make mistakes. With formr.org I usually find out quite early, because users notice the ensuing problems. Some of my scientific data analyses also involve a lot of code. Do I magically make fewer errors when working with R than with PHP and JavaScript?

No. If I’m honest with myself, I think there are probably even more errors in my work than the ones that have been found. Unfortunately, psychological science does not have a well-developed error culture and it is rare for people to reanalyse others’ data, even if they build on it.

Code review

It is probably a co-incidence, but my paternal and fitness paper was the only one where one of my co-authors (thanks, Kai) independently re-did some of my central analyses completely independently ³– and it is also the only one where I haven’t yet had to issue a correction, even though I was primarily responsible for analysis.

So, I want to further promote code review in my work. I urge it on the people I supervise, I’m collaborating on a draft to advertise the practice. But there is also one other avenue that I want to try.

A bug bounty program

So, I’m announcing a bug bounty program. Briefly, I will pay you to report errors in my work to me (up to 760€). It generally only applies to first-authored scientific work I publish from now on, but my co-author and friend Laura Botzet volunteered her first-authored preprint on birth order effects in Indonesia as a guinea pig. I’m senior author on this paper, and tried my best to check the code (I also wrote some parts myself). Data and code are openly available, so if you feel like playing with a cool dataset and an interesting research question, we encourage you to check and review our code. Please see here for the exact conditions ⁴

Appendix: A bad error culture at Proceedings of the Royal Society B

I was invited to peer review the critique ⁵and pointed out a few factual errors. As a result, the critique that was accepted for publication already partially addressed some counterpoints, leading to a quite jumbled argument-counterargument sequence.

Then, the journal messed up. After a first round of peer review and then acceptance, they allowed the authors to revise their critique further. The authors substantially changed the text and added two new arguments. But the journal sent me the older version to comment, not the one they published. As a result, our reply quotes the commenters three times – and none of those quotes are in the published version – we looked quite sloppy.

Mistakes happen (I know!), but it took a while for the journal to acknowledge the mistake, and even longer for them to decide how to deal with it. We then had to submit an erratum (apparently, in their parlance, an erratum is for journal errors, corrections are for author errors). The journal then desk-rejected the erratum, because they felt replying to the newly added arguments didn’t fit in the erratum. I then added a sentence to reflect the fact that we were not allowed to respond to these additional points. They removed this sentence without my consent, and then rephrased the erratum to put it in the journal’s voice. In the end, this is all they published. The whole ordeal after I pointed out the error, took from March to August, in excess of 40 emails, and a lot of nerves.

The original authors also did not permit me to simply post the versions that we were reacting to, including the review process, to actually allow those who care to see the whole back and forth. For posterity, the director’s cut of the erratum is here.

Footnotes:

I still need to read The Enigma of Reason which makes the point that arguing is how we reason best.
Or at least at the level that collaborators and employers will tolerate with respect to how much this slows me down.
different data cleaning, used Stata instead of R, used a different model in some places.
They are phrased rather lawyerly, because I would expect a bug bounty program to attract nitpicky people.
which I didn’t like because I didn’t feel I should get to influence whether criticism of my work is published.