C++ Anti-patterns¶

Date: 2016-12-04 A collection of anti-patterns seen in C++ projects.

C++¶

A header file should include all the other headers it needs to be compiled. Something I have seen a couple times is this here:

f.hpp:

#pragma once

void f(MyClass x);


Notice that the type MyClass is not defined here. It is defined in this header, but that is not included in f.hpp at all.

It is defined in myclass.hpp:

#pragma once

class MyClass {
public:
int member;
};


The implementation of the function f is in f.cpp. There the header for MyClass is included:

#include "myclass.hpp"
#include "f.hpp"

void f(MyClass x) {
x.member = 0;
}


And that actually compiles because the compiler only works on the .cpp files. And given the ordering of the #include statements, the C++ compiler sees this:

class MyClass {
public:
int member;
};

void f(MyClass x);

void f(MyClass x) {
x.member = 0;
}


This will work as long as that header f.hpp is always included after myclass.hpp. It will start to fail when you use f.hpp somewhere else. There should be an #include "myclass.hpp" in f.hpp.

The order of the header files in f.cpp in the above example made it possible to hide the missing include inside f.hpp such that it still compiles. One can make this fail earlier by having the following order of includes in the .cpp files:

1. “Own” header file, for X.cpp that would be X.hpp

This way, missing include statements in the header will become directly apparent. You might find a bug in a third-party library this way.

Lowercase Preprocessor Constants¶

Preprocessor constant should always be in capital letters. Otherwise you will get the weirdest of bugs that take hours to days to track down. Say there is some feature that you want to switch on/off during compile time. One way to do it would be using #ifdef like this:

int make_query(Query const &q) {
// Do some query stuff here.

#ifdef VERIFY_RESULT
// Perform some verification.
#endif

// Some other tasks and return statement.
}


When compiling, you can give the option -DVERIFY_RESULT and the preprocessor will put in the verification code. In one project I have seen this same thing, but it was with #ifdef verify_result. That is legal C++ and also works with the -Dverify_result command line option to the compiler.

One member of the project who just recently joined and did not know about the many compilation flags just created a new function bool verify_result(Result const &result). The compiler did not say that the name was not in use. But rather, the compiler saw bool (Result const &result) when the option was given because that flag did not get any value. GCC complained that it did not expect the ( there.

I do not know how long they tried to track that down, but it was long enough to be a real pain and certainly a big waste of time. This would not have happened if the preprocessor constants had been uppercase all the time.

One should not use leading underscores for definitions in normal code. Those are reserved for the standard library or the compiler. When programming with intrinsics like AVX, then one has to use functions and types that start with underscores. But one is not allowed to defined new types with leading underscores.

Two underscores anywhere in the name is also reserved.

A pattern I often see is the usage of leading underscores in include guards:

#ifndef __MY_AWESOME_HEADER_H__

// Code

#endif


There is no need for the __, really. Just having MY_AWESOME_HEADER_H would be fine as well. Or perhaps just #pragma once, but that is a different debate.

Mixing up System and Local Headers¶

There are two distinct include paths, the one that gets searched when you use #include <…> and another for #include "…". The former is for system and third-party library headers that are installed globally. You can add to that path using -I. The latter is for your project and can be amended with -i. One should not mix the two and use -I. in order to include the local headers with #include <…>.

Inconsistent Identation¶

For several years now, there is clang-format. If you do not use that, you are seriously missing out. It will format your whole codebase perfectly and also comes with an editor plugin. You do not have to worry about indentation, it will just be perfect.

Unfortunately it is very hard to apply that to a codebase which has a lot of branches in the git repository. Reformatting all the code will lead to merge conflicts and resolving them will not be any fun. So start your project right away with it or attempt to introduce it at some point.

Inconsistent formatting will make the code harder to read and reason about, something you do not want with code that is already complex.

NULL, 0, and nullptr¶

In C++ (and C) there are 64-bit unsigned integer number (uint64_t or perhaps unsigned long) and pointers (void *). In the machine, they are exactly the same thing, 64-bit unsigned integer numbers. For the type system, it still makes a great difference.

Although they are both just numbers to the computer, one should help the programmers read the source code and write

uint64_t number = 0;
void *pointer = NULL;


This way, it is clear that one is number-zero and the other is pointer-null. The preprocessor constant NULL is just 0 or 0u, so the compiler always sees a plain 0 there. Still it is helpful for the programmer to read. In C++11 there even is nullptr which has the correct type, nullptr_t.

In some code you see horrible things like these:

uint64_t number = NULL;
void *pointer = 0x0;


using namespace Inside Headers¶

One can argue about the usage of using namespace. In most cases I will not use it to keep the global namespace somewhat clean. Using using namespace std; in a .cpp file is okay because the namespace std is just dumped out in a single compilation unit. I do not have to worry about it in other files.

It becomes a totally different story once you put that using namespace std; into a header file. Then every other header or source file that includes it will have that namespace dumped out. Even other projects using that library will suffer from the attempt to save some typing of the library programming.

include Inside Namespaces¶

For reason unknown to me, there are a few includes of standard library headers inside of namespaces. The only reason I can think of is that writing ::std:: instead of std:: seemed too much work.

GCC 6.0 has changed the standard headers such that they do not work when you include them inside a namespace. This way one can catch this anti-pattern.

Duplicate Explicit Instantiation¶

Templates can drive up the compilation time a lot. One way around this is to define the template functions in a source file and explicitly instantiate them there. This works if you know all the template parameters already, say you want to template on float and double only. But if you do it, just instantiate once in the whole project. Otherwise it will become a compilation error.

Casting Pointers to Integers¶

For some reason in one project, pointers are casted to integers. And they are converted to a special type of integer that you have to choose as a flag to configure such that the length of that integer is the exact same as the pointer length, otherwise the compiler would give you an error.

I still do not understand why one would do this sort of brittle code.

Subtle Dependence on Operator Precedence¶

What is the value of x?

uint32_t x = 1 << 3 - 1;


I would read it as $$2^3 - 1$$ and that is seven. The value actually is four. When compiling with Clang, it conveniently says this:

precedence.cpp:7:25: warning: operator '<<' has lower precedence than '-'; '-' will be evaluated first
[-Wshift-op-parentheses]
uint32_t x = 1 << 3 - 1;
~~ ~~^~~
precedence.cpp:7:25: note: place parentheses around the '-' expression to silence this warning
uint32_t x = 1 << 3 - 1;
^
(    )


What did the original author of that line really mean? Was he aware that the - operation binds stronger than <<? Can I trust that code without completely understand where all those magic numbers come from?

Multiple Files with Same Include Guard¶

I do not like the #ifndef X, #define X, #endif style include guards because they are so error prone. They are hard to read but even worse, one can choose the same include guard twice in a project. One could even have the same include guard that some other library already has used.

If that is the case, the errors will be painfully subtle. Depending on the order if inclusion, the last header file will just not be included. Then you get errors about undefined types and undefined functions and might pull out your hair before you realize what is going on.

#pragma once is a good alternative except if you are working with IBM XL 12. But that compiler cannot do C++11, so it is not that interesting for me anyway.

Cyclic Dependencies¶

I saw a case where the file A.hpp provides the classes A1 and A2. The file B.hpp provided B1. For some reasons, the dependencies were A1B1A2. That is not directly a cyclic dependency of the types but of the header files. This was “solved” in the project like this in file A.hpp:

class A1 { … };

#include "B.hpp"

class A2 { … };


Sure, that compiled just fine. But again, header B.hpp is not a standalone header and can only be used in this sandwich. I have resolved this by creating the files A1.hpp and A2.hpp and properly including them inside each other to break the cyclic dependency of the files.

Using C functions in C++ might be a reasonable thing to do. If you do that, do not include X.h but rather cX. So instead of <stdint.h> use <cstdint>. This way the C functions will be properly in the std namespace.

Functions like printf, scanf, atoi, fopen, malloc, and a bunch of others should not be used in C++. There are so much better ways.

Not Using RAII¶

I have seen a couple persons program Java in C++ syntax. With that I mean they wrote stuff like this:

T *t = new T();

// Some code with t-> everywhere.

// No delete.


They did not dispose of the allocated memory again. If they were doing this to hand out the memory or something like that, a heap allocation is needed. But just for local variables, this is nonsense. I have asked why he did not just write T t; and be done with it. He said that he is used to the T *t = new T(); syntax. Then I pointed out that he has a memory leak. He replied that the runtime will take care of that. Too bad there isn’t a runtime in C++ …

Another person doing this style of programming was actually supposed to teach us students C++.

Since C++11 there are the standard std::unique_ptr and str::shared_ptr with std::make_shared and std::make_unique (last since C++14 only) that will eliminate virtually every case where you need a pointer.

Preprocessor Over Scopes¶

Another project make a lot of use of preprocessor flags to change the way the code works. This in itself is not a problem. However, there are pieces of code like this:

#ifdef FLAG
}
#endif


Including Source Files¶

This was in one of the header files:

#include "other.cpp"


Including source files in header files usually is just wrong. One edge case are unity builds or template specializations. Said project just did it for no apparent reason.

Global Variables Everywhere¶

The const keyword is there for a reason. It is a pity that it is not enforced in C. But that’s why we got C++. The project I have in mind consists of almost only global variables and routines. One cannot even call them functions, they only have side effects.

This needlessly increases the number of things the programmer has to juggle in his head simultaneously. Unit testing is impossible with such a monolithic code. How do you reason about such code?

Not Using const in C++¶

const is an awesome thing. Perhaps Rust with its mut approaches the problem even better although I am a fan of the minimal annotation concept that seems to prevail in C++. Whenever I program in C++, I try to put const wherever I can. It helps me to reason about my code (less moving parts) and helps the compiler to elide some of the variables.

If you do not even use const for your global variable pi, you are just begging for trouble. I mean I would just add the following somewhere:

void indiana() {
pi = 4;
}


And then the whole program would change its behavior and it would be a pain to debug.

Actually, I think that having no comments is great to some point. Compare

double f(double x) {
double c = 235;
double z = c / x;
return z;
}


with

// Convert {miles per gallon} to {liters per 100 km}.
double f(double x) {
// Magic conversion factor.
double c = 235;
// Do the conversion.
double z = c / x;
// Return the result.
return z;
}


and with

double mpg_to_l100km(double const mpg) {
double const conversion_factor = 235;
double const l100km = conversion_factor / mpg;
return l100km;
}


I think that good variable and function names can make a huge difference and eliminating the need for comments. If you can write your code so expressive that you do not need the comments, that is better than commenting bad code.

No Formal Docstrings¶

Python had docstrings, Java has JavaDoc and C++ has CppDoc you can set with Sphinx, Epidoc, JavaDoc and Doxygen. Not having any of those makes some tricky functions harder to use than needed in case the function is not as trivial as the name might suggest.

Even worse than no comments are comments that do not match the code. Then you are left to figure out what happens. Hint: The compiler only reads the code.

Complex Variable Names¶

What does the variable nyHP1 contain? Isn’t it obvious that it is “number of elements in y-direction halved plus one”?

In the (to me) ancient days when one did not have version control or a bug tracking system, writing those in text files next to the source seemed to make sense. Today there is no excuse to track your bugs in top of the one source file.

3400 Lines in a Single File¶

When a file is too long, you cannot keep its structure in your mind. After a couple hundred lines, you should start a new file. That is also a good opportunity to refactor the code into smaller pieces.

Of course this might increase compilation time. Then something like a unity build might be appropriate again. However, this is a distinct step and makes the file more readable.

Single Function with 800 Lines¶

Having a single function with 800 lines is just impossible to reason about. You cannot even remember the variables you have declared. It is almost as bad as if they were global.

Unreachable Code¶

Some code is just plain unreachable. Say you have a contraption like this:

if (condition x) {
// something
}
else {
// something else
if (condition x) {
// What?
}
}


There are cases when you have multiple threads and you use locks or atomic variables. Then this code might be reached. Otherwise it is just bloat.

Faulty Buffer Implementation¶

Using other people’s tested code is beautiful. So when the standard library of your programming language supports something like a buffer (std::vector perhaps), use it!

One project contained its own buffer implementation which had a serious bug: When you call clear(), it will not reset the internal cursor to the beginning and run into other memory eventually. That might corrupt the data or (hopefully) crash the program. Just use the standard library!

Superfluous Blank Lines¶

Occasionally I see lots and lots of blank lines in code. One or perhaps even two blank lines can help grouping various parts of a file. When there are five or mode blank lines in a row, it dilutes the information density on screen, I have to scroll to read it and cannot have a whole function on my screen. Getting rid of those superfluous blank lines makes the code easier to read.

Version Control¶

Not Using Version Control¶

If you are not using version control, you are probably having a harder time to collaborate or develop new features than it would need to be.

I have seen one group of four people that have send around tar archives and used the diff tool manually. That was in 2015, git has been 10 years old and there is no excuse, really.

Unstable “master”¶

The master branch is supposed to be the branch that you can checkout and compile, always. If you want to develop a new feature and it is not stable yet, have in a development branch or some feature branch.

I expect that a simple git clone (which checkouts master) will give me working code.

Outdated “master”¶

Doing all the work on develop and never merging back into master keeps that branch stable, but it is also not really cool. In one project the master was outdated and unstable. Great combination.

Abandoned Branches¶

A lot of git repositories have a lot of branches that seem somewhat abandoned. The commit messages promise some great feature that the develop branch lacks. However, the branches have not seen a commit in over a year. So are they still relevant?

Checking in Build Files¶

Way too often build files like .o files are checked into git. This means that every time you just compile the code, git will tell you about changes to the files. Now you can just commit them or leave them uncommitted. Usually they get committed and then your git log is just polluted with changes to files that should not have been tracked in the first place.

With git submodules that is really nasty because a simple compilation of the project will leave all the submodules in the dirty state. You will have to go through the submodules recursively and do a git reset HEAD --hard in order to get rid of the changes. But that resets your compilation again.

The only solution is to do git rm --cached on those files and put them into a .gitignore.

Inconsistent Version Tags¶

There are many versions to create tags for versions: Leading v, periods or hyphens or underscores between the digits, leading zeros. But please be consistent, otherwise it will be a nightmare to find the latest version.

Git Submodules via GitHub SSH¶

When repositories are hosted on GitHub, one can either use them as submodules over SSH or HTTPS. The advantage of HTTPS is that it does not need any sort of authentication in order to read. For SSH, you need to have an SSH key registered with GitHub. If you do that on your personal workstation where you have your SSH key configured with GitHub, that is not a problem at all.

On the clusters and supercomputers, I do no have my personal SSH key. And I am not going to copy my personal key to a machine which I do not control. I could create a SSH key on the cluster and add that to my GitHub account. But then you could use that SSH key to make changes to all the repositories on my GitHub account. Not cool.

Creating a whole second GitHub account is possible, but you cannot have more than one free account. So I would do get a second paid account for my master thesis work. Also something I would like to avoid.

The easiest solution is to convert the submodules over to HTTPS. Then you can just download them everywhere. Pushing changes over HTTPS is not that nice because you need to enter your password every single time. So I can understand why SSH is appealing. Also there are apparently some setups where HTTPS is blocked by SSH is not.

Switching it over to HTTPS would it make easier for people to clone the repository. Then if you want to push something, just add an SSH based remote to the repository and you can push. That way, both sides have it usable.

Unmergable Forks¶

Say one has a project that uses the double data type everywhere. One wants to let it run with float because that is a bit faster and one does not need the precision. The easy way would be to fork the project, create a new branch and just replace every double with float. Problem solved.

That cannot be sensibly merged, though. The upstream author probably still wants to have double in their version of the code. The other person wants float. Now the two forks are divergent and there is no sensible way to ever get them together again. Even worse, further changes cannot be shared without manually replacing all the double with float or the other way around.

It is better to solve this via abstraction. Work with the original author to abstract away the actual floating point data type used. Something like this, perhaps:

#ifdef SINGLE_PRECISION
typedef float Float;
#else
typedef double Float;
#endif


Then you change every double to Float in the codebase once and you can easily share the code back and forth. Actually, there is no need for a long running fork any more, the two people can just use the same code all the time and compile it how they like.

One should be careful not do overdo this, though. Exposing an option for everything in your program will make it a nightmare to actually compile and the number of combinations increases exponentially. In this particular case, where two people have actual needs, it makes more sense to solve it with abstraction than a hard fork of the whole code.

Code might not be needed any more at some point. In order to quickly get rid of the code, one can comment it out. On the long run, one should just delete the code and trust the version control to remember it. Otherwise the codebase is littered with old code that has been removed and does not serve any purpose any more.

Build System¶

Hardcoding Paths¶

Some parts of the build systems might have hardcoded paths in them. It might work well on the developer’s machine but will lead to unresolvable paths on every other machine.

Specific Version of GNU Autotools¶

On one supercomputer, I have GNU Autotools 1.11 and 1.14 available. The project that I have downloaded ships with some files that can easily be generated (see above). Those files explicitly refer to GNU Autotools 1.13 and therefore fail to build for me.

I had to remove files like configure and aclocal.m4 and recreate those with some calls to automake --add-missing and autoreconf. After that I was able to run configure.

Patch the Code to Build / Compile Branches¶

One project has to run on different architectures and with different MPI implementations and versions of compilers. Since the MPI implementations differ by some const in functions like MPI_Send, you cannot use const-aware code with an implementation which is unaware of it. The code had to be patched before building on different architectures. There were a lot of branches in git just for compiling.

This makes bisecting impossible and also packaging this for a distribution is hard. Debian and especially Fedora frown about patching the source for reasons like this. The build system should expose all the options needed to build it on the target platforms.