Software Development in Scientific Research

Martin Ueding

2020-10-14

Code & Zahlen

I've worked in research for the past four years, during my master and PhD thesis. And although I did research in physics, my main day-to-day work has been developing software. The software projects were either the large-scale simulation code, some post-processing code or the analysis.

There is a discrepancy between what a researcher is gauged by, and the work which is needed to write good software. A good researcher produces physical results, publishes a bunch of papers which reek in a lot of citations. The only thing that one describes in the papers are the theoretical methods and the results. The details of the implementation are of no concern to the audience of the paper.

The people who work to obtain the results are very much interested in the implementation, as they have to work with the code on a daily basis. They also have to implement new features in order to produce the next physical results in the future. Usually the physical results somehow build upon previous results, making it a necessity to have maintainable and extensible code. And this is where the trouble begins.

There is an experimentalist adage saying that if an apparatus survives beyond the experiment, it was overengineered. A prototype shall only yield the result that one wants, after that it can fall apart. Spending additional time in making the apparatus more robust will not be rewarded in any way by the physical community because all they see are the results in the paper. This is the mindset that seems to prevail in science.

Simple problems can be solved with a few hundred or thousand lines of code. In physics, this could be an implementation of the harmonic oscillator with the Metropolis algorithm. A PhD student can write this in a few hours. A simulation for SU(2)-QCD can be programmed in two days, I've done that myself once. This can then be used to produce some results, even if the code is written in a horrible bad way, tons of shortcuts are taken and it is not maintainable or even extensible at all. The student will work on something completely different as a next project, so the code can just be thrown away.

For larger problems, this approach does not work. It takes years to build a full scale lattice QCD simulation code. One needs to put significant effort into it if one wants to have good performance on GPUs, support different lattice actions, work with different file formats, provide different gauge smearing procedures and orthogonal features like Wilson flow. It is a software engineering project that needs to be planned for at least a decade. Short-term results are fine, but if a group wants to stay competitive, it must have a solid foundation.

Planning for a decade is hard, because in Germany most positions are temporary. PhD students have three-year contracts, and postdocs are employed somewhere between a year and four years. In some rare cases a position gets extended, such that a postdoc can stay in the same group for multiple consecutive periods, but that is not clear in the beginning and depends on the whim of the funding agencies. One can see the problem emerge: The time scales of large software projects and the usual time that a researcher is working on it does not match. I have seen this in action with various programs that I worked on, they have been passed from one PhD student to the next one. At the moment I am in the process of passing down knowledge to a new researcher, but there is just so little time to do it.

Although we don't call our software packages products, they are exactly this. They are only used internally by a small number of users. Still they are used in production of data and therefore need to work and be maintained. Each product has a product owner. That is a person who is responsible for the product, likely sets the roadmap and monitors development. Then there are developers who fix bugs and implement new features. They should also document the features such that the product can be picked up by new developers.

I have yet to start to work in a professional software company, but I am sure that the product owner is somebody who stays with the company for a longer time period. Of course even these people may be exchanged, but there will be a certain period during which the products will get transferred. The company lead will also make sure that the products are well documented such that nobody is irreplacable.

In academia, it seems to be much different. Each PhD student is the product owner of their products. They often are the sole developer as well. There is no need to write good documentation, as the student will know everything. And if they forget, they read the code and remind themselves. Onboarding developers to the product is something that does not happen regularly, and it is not really planned for. However, as they only have a temporal contract, they will eventually go. It is clear from the beginning, yet it is not properly planned for. The student will leave the group, and the products will not have an owner any more, they are orphaned. The next student will look at an abandoned project, and likely be overwhelmed by it. Everyone else has seen the physical results and assumed that extending it could not be too hard. But the student then faces the sad reality of a one-shot contraption which has to be refactored intensively before one can use it.

As at most 25 % of staff has a permanent position, and these are usually administrative jobs, there are virtually no researchers with a permanent position who are not professors. It seems sensible to make the professors the product owners. They stay with the “company” the longest, and they have an interest in having maintainable software. But this misses the reality of a professor's schedule. I have never been one, so I can only guess at what it is in detail. From what I have been told, so much time is spend on administration, teaching, writing research grants, supervising students, writing papers, that there is little time to also be the product owner of a bunch of software products on top of all that.

When the only permanent employee is the professor, but they don't have the time, who should supervise the products? The next sensible position would be a postdoc who would focus on the software side of things. They could be given some guidelines from the professor, and would then flesh them out and enforce them towards the students. They would make sure that all source code is tracked in git, uploaded to an organization on a website like GitHub, GitLab or Bitbucket. Also they would implement continious integration for all projects, encourage unit and integration tests. A shortlist of libraries should be compiled, such that all binary data is stored as (for instance) HDF5, text data as JSON and auxiliary scripts are all in Python. This persion could also serve as a SCRUM master for the projects. The product owner would eventually leave, but this would be planned for. It is the responsibility to write documentation such that the next product owner can pick it up. A smooth transition would be one of the goals for this position.

The second part which needs to be changed is the incentive for students and also this postdoc. As a PhD student, my advisor eventually grades my work. He can include non-science work like software development in my grade, but that only works with the PhD. If I would apply for a postdoc stay somewhere, he would perhaps also write a good letter of recommendation for me. But if I eventually would want to become a professor, I'd need to have a solid set of publications. Of course I know that I need good software to get there, but I cannot do all the software development alone and still publish results. I would need to cut some corners in order to get stuff done in the short term as well. And I could for instance drop the documentation. It is something I do to help other people, but I am penalized for not working for myself all the time. This fosters the environment where code is not viewed as a group asset but rather something that every students has to bother with. Without a change here, students have no incentive to write proper code that can be used by generations of other students to come. Also nobody would do this product owner postdoc position as they would not publish scientific papers. Their scientific career would end right after this postdoc stay.

In total I am a bit frustrated with the way that software development works in science. My working standards are such that it pains me to deliver bad code riddled with hacks. Of course I don't spend endless time polishing it in irrelevant areas, but actively cutting corners excessively demotivates me. Yet I managed to finish my PhD thesis in slightly less than the allocated three years. Currently, during the last days of my contract, I try to pass on the products that I have either inherited or started from scratch. And it becomes clear that three years of experience cannot be transferred within two weeks. But once I leave, there won't be much formal documentation, as I had no incentive and example to write a book about the program details.

Maybe this way is still the most efficient one to organize in a field with changing requirements. Maybe encouraging people to write documentation for prototypes is a waste of time. But perhaps these aren't prototypes any more, and the scientific community needs to realize that operationalization is necessary. I am happy to move to the industry, where incentives are hopefully much better aligned.