Microservices simplified: Exception handling

May 16, 2022

In this article I’d like to discuss how exception handling can be implemented at application level without the need of try-catch blocks at component- or class-level and still have exceptions that point to the root cause. This is part of a series of articles about how microservice architecture can be applied in a domain centric architecture without constantly dealing with technical aspects.

If you haven’t done it yet, I recommend checking my other articles in this series.

First of all, let’s discuss what are exceptions in software development.

An exception is an abnormal or unprecedented event that occurs after the execution of a software program. It is a runtime error of an undesired result or event affecting normal program flow.

Incorporating exception handling within the application’s ecosystem follows common principles to logging or other cross cutting concerns. Because exception handling is an application level behavior, therefore it’s a good approach to implement exception related code in a few generic infrastructure components, instead of mixing with the domain specific implementations which often implies repetitive and unstructured exception tracking.

In order to reduce redundant code or possible exception masking, exception handling is better to be implemented in as few places as possible, like in generic infrastructure components that wrap all the processes that the system can execute and potentially lead into an exception.

Exceptions, just like other system events, have to be caught and logged, in some cases a retry mechanism or compensatory logic can be executed in order to turn the execution back to its main flow. Usually exception logs are the ones that are getting the most attention during incident analysis, so it’s important to log relevant information about the cause of the exception by eliminating false positives and duplicates. If you happen to investigate a production issue by looking into logs having thousands of exceptions every minute, you understand why eliminating redundant and false positive exceptions is crucial.

Implementing exception handling at application level reduces the risk of exception masking or logging duplicate logs while it still allows exception logs to point back to the root cause by having the stack trace as part of the logs.

The followings are examples of exception handling implemented in the command dispatcher:

Followed by the EventSubscriber:

The CommandDispatcher and EventSubscriber are wrapping all the processes executed by the system and are the result of the abstractions discussed in this article. These two are the ideal places to implement logging and exception handling together with other cross cutting concerns.

Sometimes exception handling is required to be implemented at class level to handle component specific behavior or when the exception itself can be handled without any propagation. A good example is a cache service that is unreachable and the service itself can deal with the issue:

In the above example, the implementation of the Redis cache is replaced by an in memory cache (always empty) as a reaction to connection related errors while the exception is not propagated but instead a warning is logged.

So we already covered how and where to handle exceptions and what is still left to see how we can improve the exceptions themselves to better communicate the root cause of the issue.

My favorite way to improve exceptions in this sense is the usage of the principle called fail fast, which lets the exception occur immediately whenever it’s known it cannot be handled. This stays in contradiction of what fail silently or defensive programing practices would dictate, by trying to keep continuing the execution of process under unforeseen circumstances.

The above code returns an entity of type attachment by its Id. This is one of the most common examples for hiding the root cause of an issue by checking the existence of the entity before mapping into a domain object. In this case getting an identifier for an inexistent attachment means that an issue already has happened earlier in the flow when the id was determined. The above method, by returning null for missing attachment is just postponing the exception to the point when the attachment is actually used, moving even further from the root cause itself.

The above code does no longer protect the execution by calling FirstOrDefault and checking for null, but instead it will fail at the first line with an exception “Sequence contains no matching element”, caught and logged at the application level. This way the above code has a better signal to noise ratio and the exception is thrown from a place closer to the root cause.

The first example contains some elements from defensive coding, which I consider valuable for shared libraries written, without any knowledge or control over how they are called. For implementations which, together with their callers, are part of the same system, I prefer the fail fast approach over defensive coding. Unless the business domain dictates otherwise.

I hope with the above examples I managed to highlight the importance of application level exception handling (or other cross cutting concerns), that are no longer part of the business domain. Resulting in implementations following clean code principles while also allowing to provide a common structure for exception logs.

You may also like…

Microservices simplified: Concurrent processes

Microservices simplified: Concurrent processes

Handling concurrent processes it’s not specific to microservices, but microservices and distributed systems in general bring an additional complexity to the table, which is caused by the fact that multiple concurrent and distributed flows can run in…

read more