A Useless and Harmful Abstraction

Martin Ueding

2023-09-12

Code & Zahlen

In a codebase that I have looked into I've seen an abstraction which is so broad that it is useless and also harmful. I'll show what it is, point to the problem and give some ideas to improve it.

Say we want to build something that offers a REST API via HTTP. We could have some sort of class which holds an incoming request. We could use a simple Python dictionary for that. The requests that we want to handle are complicated multi-step things that require some state to be passed along.

In this code all the state is put into a single dictionary and passed along. This in itself is not too bad, we could think of some pseudo code like this:

def handle_request(request: dict) -> dict:
    step_1(request)
    step_2(request)
    step_3(request)
    return request

But that code was not considered clean enough, so an abstraction was made. I have renamed it to make it clearer what it really does, the name in the code sounds a bit more innocuous. This is the interface:

class StateModifier(abc.ABC):
    @abc.abstractmethod
    def modify_state(state: dict) -> None:
        ...

And then there was an aggregate like this:

class AggregateStateModifier(StateModifier):
    def __init__(self, modifiers: list[StateModifier]) -> None:
        self._modifiers = modifiers

    def modify_state(state: dict) -> None:
        for modifier in self._modifiers:
            modifier.modify_state(state)

This then enabled to rewrite the main business logic thing like this:

def handle_request(request: dict) -> dict:
    aggregate_state_modifier(request)
    return request

That seems to be clean. But actually one has no idea what this does. Also you have no chance of inspecting it. Where was aggregate_state_modifier created? Somewhere else. Which exact StateModifier instances does it contain? We don't know. One has to find the factory function and dig through it. And there are various decorators implemented as well.

The Problem

So what is the fundamental issue here? This abstraction violates the Liskov substitution principle. The principle states that all implementations of an interface must be substitutable with each other. We must not care about the specific type, we must only care about the interface.

Take for instance something that can emit log messages. It will have a really simple interface:

class Logger(abc.ABC):
    @abc.abstractmethod
    def info(message: str) -> None:
        ...

    @abc.abstractmethod
    def warning(message: str) -> None:
        ...

Now we can form a FilesystemLogger that logs to a file, a JournalLogger for Linux journald or even have something like a TelegramLogger which pushes messages via the Telegram messenger. The implementations are different, but when I use one of them, they all work exactly the same. I pass a message and it will be handled as a side effect.

The StateModifier interface doesn't encapsulate anything except the call signature. That is a coincidental abstract and not a purposeful one. The problem here is that the modify_state methods need to be called in the right order because one of them might set fields in the state dict which are needed by the next state modifier. This couples the implementations of the interface with each other, although the abstraction was supposed to make things easier.

Ideas towards a solution

The inherent problem is coupling and the “leaky abstraction”. We here are working with different phases of an algorithm. The phases are not interchangeable, so there is no point in creating an abstraction for these steps. The steps should be just simple functions if possible.

The other thing is that the state dictionary is a blob of data and one doesn't see which parts are used by which phase of the algorithm. We pass in a dictionary and each method might read values, modify values or create new values. We just don't see it in the code.

As a first step I would undo this abstraction and make the used fields very explicit. Perhaps something like this:

def handle_request(request: dict) -> dict:
    user = verify_api_token(request["token"])
    temperature = weather_forecast(request["location"])
    bill_api_call(user)
    return {"temperature": temperature}

This also has three phases, but now the fields are more clear.

If one wants to abstract some things away, one could start to encapsulate the request object and have some things as property methods (Python getters). This way one would not have certain values as explicit states but rather compute them from the other values when requested. This would ensure that invariants in the request object would be much easier to enforce.

Another thing one could do is to split up the request object into more focused objects which hold intermediate state.

Of course it is hard to refactor something in a larger code base, as of course there are not only three steps but likely many more.

In conclusion we can say that it is important to check whether the chosen abstractions fulfil the SOLID principles. Although not perfect, they form a good start to check a software design.