How did it do that? — nettTracker

One of the beautiful features that nettTracker has is the ability perform automated tasks for our users. For example, all of the monthly balance sheet posting takes place automatically on the last day of the month. But it isn’t necessarily quite so beautiful if things go wrong. When systems perform tasks automatically, they unleash the nightmare that is diagnostic tracing.

It can be hard enough for a developer to work out why something went wrong when they can step through each line of code in a controlled environment and see what happened at each point, but when the code can be scheduled to run at specific times, usually overnight when processing loads & network traffic are at their lowest, tracing faults and being able to understand the causes is a critical challenge to the dev team.

We built two key architectural components when we started nettTracker. The first of these was a flexible task scheduling component that we could use to plug new automated tasks into the system as we created them. This is a really cool system and we can plug in new automated features without having to redeploy everything.

The second key component was our logging platform. And it saves our lives!!

This sophisticated tool allows each of the different sub-systems to put their logs and trace records in a single repository that can be analysed to give clear trends on how the system is being used, where we need to invest enhancement effort and, most importantly, what was happening before and after an error occurred.

We capture thousands of log entries each day that we can then analyse for errors & repeated faults. In many cases we actually identify problems before they even result in an error. We have have daily error reports distributed to the team so we can quickly pick up something that might have happened during the previous 24 hours. In fact, it is common for our support team to reach out to a customer to let them know they have a problem before they have even become aware of it themselves!

In most engineering environments, it is putting effort into the boring stuff that gives you reliability and stability, but for us programmers, nothing can beat a damn good log.