Log Statements in Automated Checks are Not Optimal

Near the historical origins of the hardware-software interface, and early in the history of software services, logs were born: discrete packets of information with time stamps and some quanta of information, e.g., the voltage at some instrument at an electrical substation, or basic information from a web server about an incoming HTTP request.

Log statements satisfied the need for system status and operating information in a basic way, and they still do. They are a simple and highly performant solution to the problem. Given that such log statements are persisted in a linear list, I call this the Linear Logging pattern.

Linear Logging works for applications like web servers that are expected to continue running for an indefinite time, where the event series does not usually have a definite beginning or end, and the variety in data and format of logged events is limited.

When “test automation” began – what one could more accurately call “automated checks” of product behavior - it was natural to apply the Linear Logging pattern again when people needed to know what was going on; it was familiar, and easy to implement.

Unfortunately, there are downsides when applying Linear Logging to automated checks:

There is no natural fit to beginning and ending points of the procedure, the time span of the procedure, or the time span of an individual step.

The more log statements occur during a repeatable automated check, the greater the number and variety of log statements in the ordered list structure of a check artifact, which tends to frustrate analysis of the check run, whether manual or automated.

To get more information on what is going on at check runtime, e.g., to add notes in the procedure on information from instrumentation or telemetry, team members have only two options:

  1. Add a new log statement, making a long list of log statements in the artifact even longer.
  2. Add data in the log statement itself, which tends to mix data formats and mix data – the facts you need to know! - with presentation of the data, e.g., HTML markup or simple English grammar.

Linear Logging with automated checks has value when team practice assumes manual follow-up on failure of a check, but it does not scale well to the more demanding software quality practices of today when the team needs the quality data faster and in greater detail. For example, a common pattern in software development today is “gated check-in,” where many bottom-up or end-to-end checks need to run in a time span that is acceptable for code submissions to the system. For timely handling of a large set of E2E verifications, handling of check results needs to be automated, especially, but not exclusively, when a check fails. Linear Logging does not work well in this context.

Logs are simply the wrong tool for runtime documentation of quality automation code.

In my next post here, I will present a much more effective way to approach structuring and persisting data on what happens during the check run, including how to make the check procedure self-documenting in robust detail and in a way that works nicely with automated analysis.

No Comments

Add a Comment

Sign up here for emails with bites of wisdom on quality automation and MetaAutomation

Recent Blog Posts

  • The differences: Manual Test vs. quality automation

    In my last post I describe out the two kinds of automation that fit in the quality automation space.

    People who do quality automation (at least, the part of quality automation that drives and … more

  • The two halves of quality automation

    Quality automation is the domain (or problem space) of driving the SUT, measuring and recording data on SUT behavior and communicating that data to the business. I also use “quality automation” to … more

  • Fixing the false negative problem

    False negatives happen when these three things happen in order:

    Operations (ops) promotes the software to the next level, or ships it to end-users

    Someone (or, some automated process) discovers a … more

  • Fixing the false positive problem

    With all the quality automation that is your responsibility, a run of a check failed. It is your job to check it out.

    After 30 minutes or so of investigation, you find that the failure happened … more