A Code Rewrite is almost never the Solution

Every now and then people look at a codebase and say that it is horrible and needs a complete rewrite. This might sound appealing at first, but it is almost never the solution to the problems of the code.

The story usually starts with somebody who is new to an existing codebase. They will find that it is hard to understand. And over time they will grow increasingly frustrated with it. Eventually they figure out that the code has been grown in an unsustainable fashion. There are various hacks, legacy bits that nobody wants to touch, and so on. They tack on a feature here and there, but eventually they reach a conclusion: This codebase is a mess, we should rewrite it from scratch to get rid of all that technical debt!

At first, this might sound like a sensible idea. The old code is a mess and it would take forever to refactor into something sensible. It is easier to just rewrite it all. And then we can finally write it the way that it should be written!

But here is the kicker: How do you make sure that the new codebase will not suffer the same fate as the old one? Do you really think that the team writing the old code intended it to look like it looks now?

It is important to learn why the old code looks as bad as it does. Usually it is because people just append features and do not refactor as they go. They also don't refactor prior to adding new features. As the requirements for the code evolve, the code has to be adapted. These new requirements also often require new abstractions to be put into the code. If one doesn't do that, the mental model of the code doesn't fit to the code itself, it becomes a mess.

Maintaining code quality requires a lot of effort. And I have seen various work contexts where this effort is not put in. In the academic context students have a limited time of work with their thesis and do not really care about the code after it has created all the data that they need. Therefore they just build the code until it satisfies their particular needs. The problem comes in when their advisor likes the results and assigns the next student to continue it. The requirements drastically change, the old developer is gone, and the new student is overwhelmed with the old code. They will dig through the code and find it to be a mess. And then they will rewrite it for their own project.

For a PhD student, this can be a sensible idea. They might set out to write the new code in a way that is extensible for the next student. But eventually they will have created a mess on their own because they just don't get a reward for the code. And then they also don't know what the next student is going to do with the code. After their time, there is another convoluted code with cut corners which the student after them will likely just throw away.

In a business setting you fortunately have more stakeholders than just a single student. There is much more continuation. And so it is easier to maintain the code that one has instead of rewriting it. One just needs to learn how.

It starts by identifying the issues that made the codebase as bad as it currently is. This can be a perceived lack of time to do quality work. Or a general lack of testability in the code. If there are these fundamental issues with the way that the team works, a rewrite will just create another such codebase. It might look well at first, but as soon as the first requirement changes comes around, it will spiral into a mess as well.

Robert C. Martin wrote about this in Clean Code1 in Chapter 1 in »The Total Cost of Owning a Mess«:

A new tiger team is selected. Everyone wants to be on this team because it’s a green-field project. They get to start over and create something truly beautiful. […]

Now the two teams are in a race. The tiger team must build a new system that does everything that the old system does. Not only that, they have to keep up with the changes that are continuously being made to the old system. […]

This race can go on for a very long time. I’ve seen it take 10 years. And by the time it’s done, the original members of the tiger team are long gone, and the current members are demanding that the new system be redesigned because it’s such a mess.

If you have experienced even one small part of the story I just told, then you already know that spending time keeping your code clean is not just cost effective; it’s a matter of professional survival.

The most important point here is that the old code has all the features, and the new code doesn't have any of them at the beginning. It is hard (but still much easier) to refactor the old code than to write a new code (easy), keeping it in good shape (hard) and in sync with all the new features in the old code (very hard). So although it is hard to maintain that old code, it is still easier than keeping the old code alive and developing a new code in parallel.

Regarding refactoring, Kent Beck put this nicely:

[F]or each desired change, make the change easy (warning: this may be hard), then make the easy change.

Therefore I can recommend to use ressources like Refactoring Guru and books like the following to learn how to do architecture/design in a continuous fashion.

  1. Feathers, M. Working Effectively with Legacy Code. (Pearson, 2004).
  2. Fowler, M. Refactoring. (Addison Wesley, 2018).
  3. Gamma, E., Helm, R., Johnsson, R. E. & Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software. (Prentice Hall, 1994).
  4. Martin, R. C. Clean Architecture: A Craftsman’s Guide to Software Structure and Design. (Pearson, 2017).
  5. Ousterhout, J. A Philosophy of Software Design. (Yaknyam Press, 2018).
  6. van Deursen, S. & Seemann, M. Dependency Injection: Principles, Practices, and Patterns. (Manning, 2019).

This will keep code maintainable and the overall pain of working with it lower. It will still be work, that is never going to change, of course.

  1. Martin, R. C. Clean Code: A Handbook of Agile Software Craftsmanship. (Pearson, 2008).