Prelude
This post is part of a series designed to make you think about your own design philosophy on different topics. If you haven’t read the following post yet, please do so first:
Develop Your Design Philosophy
Introduction
Systems cannot be developed assuming that human beings will be able to write millions of lines of code without making mistakes, and debugging alone is not an efficient way to develop reliable systems. - Al Aho (inventor of AWK)
When a developer reaches for a debugger out of habit, the debugger begins to cause more harm than good. At that point, the debugger is hindering the developer’s ability to step back, slow down and reason about the problem. A developer with this habit never practices reading code carefully and thus, cannot maintain a good mental model of the code for a project and how it runs on the machine. Simply put, reliance on a debugger hinders a developer’s ability to read code and write reliable software.
As Al said, developers are going to make mistakes. It’s human nature. But there needs to be a way to capture these mistakes and fix them, without depending on a debugger. This is where logging can help. Yet, what, when and how best to log is complicated. What makes matters worse, the answers to these questions can be different from project to project.
Purpose
There is one question you need to ask before you start any project, what is the purpose of the logs for this application? I can’t tell you how many developers I meet who can’t answer this question. When this question can’t be answered, it’s a smell that the logging is probably inconsistent, mostly noise, being utilized as an insurance policy and/or lacking any guidelines for developers to follow.
I believe logging, when reasonable and practical, should be isolated to the single purpose of debugging errors. This is the same whether the application is running in production or development. Logging can’t always be isolated to the debugging of errors, but when it can, it will simplify things tremendously. If you move concerns like metrics and informational data out of the logs, the guidelines for how, what and when to log become clear to every developer and can be applied in a consistent manner. This also extends to code reviews and specialized linters.
Design Philosophy
If the purpose of the logs can be isolated to debugging errors, we only need to log information that a person needs in order to debug any given error. Logging information that is not associated with an error and does not help a person in debugging, should be considered noise and should be questioned. Remember, the reason to log is because something is failing and the logs hold the answer.
It’s human nature to want to capture every little thing that happens inside the logs. But how many times have you looked at logs when the application didn’t need your immediate attention? This is why warnings or other points of interest don’t belong in the logs. For this type of information, I believe dashboarding should be used. Use dashboarding to publish points of interest for an administrator or developer. As Gabriel Janson says, “you should question the usefulness of any information you publish to validate you are establishing a clear relationship between the problem and its cause”.
I have tried for years to separate logging from error handling but I now believe this was a mistake. Logging and error handling are coupled and not separate concerns. This is a critical design philosophy; handling an error means logging the error. Our code must apply the practice of logging as part of the error handling logic. This will allow for strict but simple guidelines that can be applied consistently by any developer. There is no guesswork. If a function/method is handling the error, then it must log the error.
What does handling an error mean? It means three things:
- The error has been logged.
- The application is back to 100% integrity.
- The current error is not reported any longer.
Language Mechanics
The key to applying this design philosophy is having the mechanics to log all the context we need in the function/method that is handling the error. This can be accomplished by using Dave Cheney’s errors package:
Let’s show an example of passing the error up the call stack and eventually handling the error. These are the only two mechanics we need to have in place.
Here is a function that calls into the standard library to download content from the web. The standard library will be providing the root cause error value. Because this function will not be handling the error, it will wrap the error and pass it up the call stack for future handling.
Listing 1
20 func Download(url string) ([]byte, error) {
21
22     // Download the tar file.
23     r, err := http.Get(url)
24     if err != nil {
25         return nil, errors.Wrapf(err, "get url: %s", url)
26     }
27     defer r.Body.Close()
28
29     // Read in the entire contents of the file.
30     body, err := ioutil.ReadAll(r.Body)
31     if err != nil {
32         return nil, errors.Wrap(err, "writing file")
33     }
34
35     // Return the file.
36     return body, nil
37 }
From a developer’s perspective, this is an easy guideline to follow. We ask, does this function handle the error? If not, wrap and pass it up. If yes, then log and pass nil if necessary.
Let’s look at a function that handles the error.
Listing 2
11 func main() {
12    
13     data, err := Download("https://ftp.gnu.org/gnu/binutils/binutils-2.7.tar.gz")
14     if err != nil {
15         log.Printf("%+v", err)
16         return
17     }
18
19     ...
20 }
Here in main, the call to Download is made and then the error is checked. If there is an error, it is handled by logging it with the "%+v" format option. This will generate a stack trace with all the context we need.
Listing 3
2017/05/04 15:07:00 Get https://ftp.gnu.org/gnu/binutils/binutils-2.7.tar.gz: dial tcp: lookup ftp.gnu.org: no such host
get url: https://ftp.gnu.org/gnu/binutils/binutils-2.7.tar.gz
main.Download
    /Users/bill/code/go/src/github.com/ardanlabs/gotraining/topics/temp/main.go:25
main.main
    /Users/bill/code/go/src/github.com/ardanlabs/gotraining/topics/temp/main.go:13
runtime.main
    /usr/local/go/src/runtime/proc.go:185
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:2197
Guidelines
Handling an error means:
- The error has been logged.
- The application is back to 100% integrity.
- The current error is not reported any longer.
Packages that are reusable across many projects only return root error values.
The choice to wrap an error is a policy that only applications can choose to apply. Packages that are going to have the highest level of reusability can only return root error values. This mechanic is the same that is used in the Go standard library.
If the error is not going to be handled, wrap and return up the call stack.
This is the basic question we ask about every error that is returned from a function/method call. If the function/method is not going to handle the error, then wrap the error with enough context and return it up the call stack. That extra context can be, for example, the input values that were used or the query that failed. A good way to determine if you are logging enough context or too much context is by inspecting the logs and validating that they are working for you during development.
Once an error is handled, it is not allowed to be passed up the call stack any longer.
Once it is decided that the function/method will handle the error, the error ceases to be an error. If the function/method is still required to issue a return, it can’t return an error value. It should only return nil.



