A Useless and Harmful Abstraction
In a codebase that I have looked into I've seen an abstraction which is so broad that it is useless and also harmful. I'll show what it is, point to the problem and give some ideas to improve it.
Say we want to build something that offers a REST API via HTTP. We could have some sort of class which holds an incoming request. We could use a simple Python dictionary for that. The requests that we want to handle are complicated multi-step things that require some state to be passed along.
In this code all the state is put into a single dictionary and passed along. This in itself is not too bad, we could think of some pseudo code like this:
def handle_request(request: dict) -> dict: step_1(request) step_2(request) step_3(request) return request
But that code was not considered clean enough, so an abstraction was made. I have renamed it to make it clearer what it really does, the name in the code sounds a bit more innocuous. This is the interface:
class StateModifier(abc.ABC): @abc.abstractmethod def modify_state(state: dict) -> None: ...
And then there was an aggregate like this:
class AggregateStateModifier(StateModifier): def __init__(self, modifiers: list[StateModifier]) -> None: self._modifiers = modifiers def modify_state(state: dict) -> None: for modifier in self._modifiers: modifier.modify_state(state)
This then enabled to rewrite the main business logic thing like this:
def handle_request(request: dict) -> dict: aggregate_state_modifier(request) return request
That seems to be clean. But actually one has no idea what this does. Also you have no chance of inspecting it. Where was aggregate_state_modifier
created? Somewhere else. Which exact StateModifier
instances does it contain? We don't know. One has to find the factory function and dig through it. And there are various decorators implemented as well.
The Problem
So what is the fundamental issue here? This abstraction violates the Liskov substitution principle. The principle states that all implementations of an interface must be substitutable with each other. We must not care about the specific type, we must only care about the interface.
Take for instance something that can emit log messages. It will have a really simple interface:
class Logger(abc.ABC): @abc.abstractmethod def info(message: str) -> None: ... @abc.abstractmethod def warning(message: str) -> None: ...
Now we can form a FilesystemLogger
that logs to a file, a JournalLogger
for Linux journald or even have something like a TelegramLogger
which pushes messages via the Telegram messenger. The implementations are different, but when I use one of them, they all work exactly the same. I pass a message and it will be handled as a side effect.
The StateModifier
interface doesn't encapsulate anything except the call signature. That is a coincidental abstract and not a purposeful one. The problem here is that the modify_state
methods need to be called in the right order because one of them might set fields in the state dict which are needed by the next state modifier. This couples the implementations of the interface with each other, although the abstraction was supposed to make things easier.
Ideas towards a solution
The inherent problem is coupling and the “leaky abstraction”. We here are working with different phases of an algorithm. The phases are not interchangeable, so there is no point in creating an abstraction for these steps. The steps should be just simple functions if possible.
The other thing is that the state dictionary is a blob of data and one doesn't see which parts are used by which phase of the algorithm. We pass in a dictionary and each method might read values, modify values or create new values. We just don't see it in the code.
As a first step I would undo this abstraction and make the used fields very explicit. Perhaps something like this:
def handle_request(request: dict) -> dict: user = verify_api_token(request["token"]) temperature = weather_forecast(request["location"]) bill_api_call(user) return {"temperature": temperature}
This also has three phases, but now the fields are more clear.
If one wants to abstract some things away, one could start to encapsulate the request object and have some things as property methods (Python getters). This way one would not have certain values as explicit states but rather compute them from the other values when requested. This would ensure that invariants in the request object would be much easier to enforce.
Another thing one could do is to split up the request object into more focused objects which hold intermediate state.
Of course it is hard to refactor something in a larger code base, as of course there are not only three steps but likely many more.
In conclusion we can say that it is important to check whether the chosen abstractions fulfil the SOLID principles. Although not perfect, they form a good start to check a software design.