January 30, 2020•360 words
A wild issue appears
At my current client, we have been chasing a frustrating issue between our NodeJS frontend and a specific .NET service. Adding distributed tracing from front- to back-end services didn't give us any clues. We sprinkled logging everywhere, but this also didn't give us any clues. Frustrated and burnt-out developers roamed the office, crying tears of failure.
Pinpointing a possible source
Going through the massive amount of logging, we finally ended up pinpointing the issue to a request failing with
ECONNABORTED due to a timeout of 3000ms being hit. Writing a test script that hit the service from the front-end Kubernetes pod to the .NET service pod seemed to point us in the direction of
net.core.somaxconn - a kernel-level setting describing the maximum amount of socket connections handled in a queue. Reproducing it through docker seemed to yield the same results, especially since running the .NET service directly didn't give us the error.
All was not well though, as one of our developers decided to up the amount of requests of our test script to hit the local .NET service, and you'll probably guess what happened. The requests also started failing: the difference of the docker/k8s layer removed gave the service better performance, but it still ended up hitting some limit!
Several days of pair-debugging ensued. A hybrid mix of front- and backenders formed and tried out all kinds of scenarios. Then, during a moment of despair, one of the backenders peered at his Rider window, saw the logging coming in way slower than the NodeJS test script, and suddenly remembered an article he read.
We changed the settings according to the article, fired up the test script and......... bliss. We had fixed the error. Now, if you're wondering what exactly this magical fix entailed, I'll give you a very, very simple tldr:
DISABLE INFORMATION LEVEL LOGGING IN YOUR .NET SERVICES
This immidiately increased the succesful response count from 1000 to about 7000. That's roughly a free 7 time performance boost for this specific back-end service.
Please forward this article to all your colleagues, friends, grandmothers, pals and other assorted acquaintances.