Haunting Down Errors in Go

posted 06 Aug 2023

There are two kinds of Go developers:

The ones that feel they write
```
if err != nil {
  return nil, err
}
```
too often, and
Liars

Well, at least that used to be the case. Now we have changed it to something like

if err != nil {
  return nil, fmt.Errorf("Could not read user: %w", err)
}

This is slightly better, but in my experience, it still leads to code that is harder to debug than a plain old stack trace. You end up with a chain of prefixes to an error, and it’s too easy to copy-paste an error prefix by accident. So if you use the pattern above just to add more context to an error, consider using go-errors/errors or something similar instead¹.

Not Every Error is Worth Handling

Now, some Go developers really hate stack traces attached to errors. The usual rebuttal is something along these lines:

Adding stack traces only treats the pain and not the root cause. The Go proverbs say that we should not just check errors, but handle them gracefully. If we consistently do that, there’s no need for stack traces.

This is correct, but ignores the cost of checking and handling every single error that may pop up. Consider this piece of code:

func GetUser(tx *sqlx.Tx, uid UserID) (*User, error) {
   var u User
   err := tx.Get(&user, `
SELECT fields, for, user
FROM users
WHERE id = $1`, uid)
  if err != nil {
    // TODO: handle error
  }
  return &u, nil
}

If this was code running against a Postgres database, then we’d “have to” check over 250 error codes. Most are clearly not relevant, but some are, and it’s not always obvious which are and which aren’t. If an error is not expected to happen and has very little impact on the system, then bubbling the error up and reporting it instead of trying to handle it is a time saver for development.

This is especially true if it’s hard or impossible to produce the error yourself, which is often the case for third-party APIs and external systems.

That being said, you can’t just ignore errors because it’s hard to test them. As I said, the error has to have very little impact on the system: Your payment integration has to be more thoroughly tested than your Gravatar integration, even if the latter is easier to test. That your users’ profile pictures default to some fallback image for some time is way more acceptable than duplicate payments.

But Error Messages Must be Debuggable

Even if the errors aren’t that big of a deal in isolation, you still have to be able to discover the “unknown unknown” errors. For example, you don’t usually need to think about transaction deadlocks in the early phases of a company, but a continuous rise in deadlocks warrants your attention at some point. That’s hard to do if you don’t know that they have started to happen.

To know that, the error has to be a bit more descriptive than

"something bad happened" when tx.Get was called inside GetUser

and that’s typically the case for most errors you get back. But I think this is the bare minimum, and a little (emphasis on little) future-proofing helps to debug in production a lot².

For example, if you have SQL constraints, be sure to output the name of the constraint that was violated. If you generate SQL on the fly, perhaps ensure that the generated query is provided whenever you end up with an “ambiguous row” or a similar kind of error.

Now With More Phantoms

A Gopher doing detective work on a "sql: no rows in result set" message, in
the style of the "this is fine" meme — Created from assets in the Free Gophers Pack by Maria Letta

One that’s frequently forgotten is sql.ErrNoRows. While I have enough information to debug an issue with "sql: no rows in result set" and a stack trace, it’ll be so much faster to debug if the error message was "sql: no rows to create mypkg.Users". Adding a prefix would help out here, but you can get an improved error message “for free” with phantom types, as long as you provide the type:

type NotFound[T any] struct{}

func (NotFound[T]) Error() string {
  var zero T
  return fmt.Sprintf("sql: no rows to create %T", zero)
}

func main() {
  fmt.Println(NotFound[MyType]{})
  // → sql: no rows to create main.MyType
}

You could use this error by manually replacing instances of sql.ErrNoRows, but I’m personally a big fan of sqlx: It relieves you from the worst pains of using database/sql directly without being a fully fledged ORM. The function I used above in the GetUsers example – Get – can deserialise a single SQL result row into a struct value.

We can make a small wrapper around sqlx.Get to attach that type information whenever we get back an sql.ErrNoRows error:

func Get[T any](tx *sqlx.Tx, query string, args ...interface{})
               (*T, error) {
  var dest T
  err := tx.Get(&dest, query, args...)

  // I skip stack traces here, but you'd add
  // them to the errors where it's relevant.
  if errors.Is(err, sql.ErrNoRows) {
    return nil, NotFound[T]{}
  }
  // handle more types of errors here
  if err != nil {
    return nil, err
  }
  return &dest, nil
}

And using it actually turns the code a tiny bit shorter:

func GetUser(tx *sqlx.Tx, uid UserID) (*User, error) {
   return dbutil.Get[User](tx, `
SELECT fields, for, user
FROM users
WHERE id = $1`, uid)
}

A nice side effect of this is that you can use errors.Is to find out what failed in a bigger query:

switch {
case errors.Is(err, dbutil.NotFound[User]{}):
  // do something if we could not find user
case errors.Is(err, dbutil.NotFound[Team]{}):
  // do something if we could not find team
case err != nil:
  // ...
}

though I have personally not needed to utilize that yet, and I can imagine it being a bit fragile in general.

Incrementally Spooky

A cool thing about this change is that it can be “completely” backwards compatible. Right now it is close to backwards compatible, but not quite: If you have errors.Is(err, sql.ErrNoRows) in the code base, then the new type will not match. However, you can implement the Is(target error) bool interface as specified in the documentation for errors.Is to fix that:

// for backwards compatibility, so that
//
//     errors.Is(err, sql.ErrNoRows)
//
// still returns true
func (NotFound[T]) Is(target error) bool {
  return target == sql.ErrNoRows
}

This only works if you don’t use err == fixedError comparisons, but there are linters out there to detect those for you.

And that’s it! A small dash of phantom types for slightly better error messages.

I prefer to have my own error type where the stack trace is always generated whenever I encounter an unexpected error, but that is a different blog post. ↩
Of course, you will have some unknown unknowns here as well, so there will be situations where you have to deploy a new version of your services to get enough debug data out. That’s not usually a problem, but if your deployment schedule is on the order of weeks, then a bit more future-proofing may be reasonable. ↩