Introduction
I was guided for many years to write functions that are generalized and to create layers upon layers of abstraction so things don’t break as business requirements change. That the cost of breaking a function signature, for example, is expensive and something that should be avoided. Therefore, write functions that take more generic parameters or hide things in a receiver or context to be less prone to breakage.
On the surface this seems like a good and reasonable idea. However, I have come to believe that this practice leads to engineering problems that I consider to be much worse than the supposed benefit. For private code bases, which is the majority of the code I work on, breaking an API should be encouraged if it will make the code base better. It’s better to refactor and keep the code base clear than to head down a path of legacy code.
As an example, a generalized function API would ask for a car when it needs a set of tires. You might say, well one day I might need a steering wheel, so asking for an entire car now means I have everything I need today and in the future. The problem is, it’s not obvious what things the car must be equipped with over time in order for this function not to fail. The whole situation is setting everyone up for mistakes, fraud and nasty bugs. All because you don’t want to break the API and refactor when you do actually need a steering wheel.
I believe you want to move away from writing generalized code and adding extra layers of abstractions, and focus on being precise. Precise means being clear, consistent and efficient while minimizing, reducing and simplifying the code base. Even though being precise may break code and require refactoring when things are changing, the benefits outweigh the cost. These are the benefits of being precise:
- Write less code.
- Less possibility for misuse.
- Better defense against fraud.
- More accurate testing and debugging.
This philosophy stems from both personal experience and the wisdom of others.
Quotes
These are quotes from people I respect in and out of the Go community that I believe provide a strong basis for this philosophy.
“The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.” - Edsger W. Dijkstra
“A good API is not just easy to use but also hard to misuse.” - JBD
“Making things easy to do is a false economy. Focus on making things easy to understand and the rest will follow.” - Peter Bourgon
“This is a cardinal sin amongst programmers. If code looks like it’s doing one thing when it’s actually doing something else, someone down the road will read that code and misunderstand it, and use it or alter it in a way that causes bugs. That someone might be you, even if it was your code in the first place.” - Nate Finch
Readability Code Reviews
Code reviews which are specifically geared toward readability are a critical aspect of software development and something you need to be doing on a regular basis. Focusing on API and abstraction decisions can filter out things like type pollution, vagueness and unnecessary complexity. In the remainder of this post, I will present a piece of code that was being prototyped and presented to me for review. I will provide my review with a focus on being precise.
You don’t need to agree with any of the opinions I present in this code review. These are my thoughts, philosophies and opinions about readable code in Go and the engineering tradeoffs I am willing to make.
Code
Here is the code that was being prototyped. The design of this prototype was to abstract the concept of a set of named SQL statements for the execution of different business related logic. The idea was to declare multiple concrete types like AddUser
that implemented the business logic. This is all I knew when the code was presented to me.
In this exercise, I will focus on a readability review of the prototype code and not the merit of the design idea. Here is the code.
Listing 1
01 type Query struct {
02 Raw string
03 Named *sqlx.NamedStmt
04 }
05
06 func (q *Query) Build(db *sqlx.DB) error {
07 var err error
08 q.Named, err = db.PrepareNamed(q.Raw)
09 return err
10 }
11
12 func initQF(db *sqlx.DB, qf func() *Query) error {
13 q := qf()
14 return q.Build(db)
15 }
16
17 func defineQ(q *Query, raw string) *Query {
18 if q != nil {
19 return q
20 }
21 q = &Query{
22 Raw: raw,
23 }
24 return q
25 }
26
27 type AddUser struct {
28 checkUserExists *Query
29 insertUser *Query
30 }
31
32 func NewAddUser(db *sqlx.DB) (*AddUser, error) {
33 var au AddUser
34 if err := initQF(db, au.CheckUserExists); err != nil {
35 return nil, err
36 }
37 if err := initQF(db, au.InsertUser); err != nil {
38 return nil, err
39 }
40 return &au, nil
41 }
42
43 // Here is the nice part: You can define your queries in their methods,
44 // which seems clean and easily inspectable.
45 func (au *AddUser) CheckUserExists() *Query {
46 return defineQ(au.checkUserExists,
47 `SELECT COUNT(*) FROM users ...`)
48 }
49
50 func (au *AddUser) InsertUser() *Query {
51 return defineQ(au.insertUser,
52 `INSERT INTO users ...`)
53 }
54
55 type UserQuery interface {
56 CheckUserExists() Query
57 InsertUser() Query
58 }
59
60 func SignupUser(uq UserQuery) {
61 uq.CheckUserExists()
62 uq.InsertUser()
63 }
I am a big believer in prototyping and flushing out design through pairing and review, especially when something isn’t obvious. Though this code would have worked, there is type pollution, unnecessary abstractions and no clear delineation of responsibility between the API and the user.
It can seem like a daunting task to know where to start in a code review. Usually I will try to reorder the declaration/implementation of things in the order of when they are used. It helps to see dependencies early on. I have already done that so let’s start with the first type Query
.
Listing 2
01 type Query struct {
02 Raw string
03 Named *sqlx.NamedStmt
04 }
05
06 func (q *Query) Build(db *sqlx.DB) error {
07 var err error
08 q.Named, err = db.PrepareNamed(q.Raw)
09 return err
10 }
The type named Query
on line 01 seems reasonable. It is declaring an encapsulation for the combination of a query string and a prepared named statement. The use of pointer semantics for the Build
method on line 06 seems reasonable as well, but the implementation of the method smells a bit to me.
It’s bothering me that on line 07 the err
variable is being declared standalone and not during the method call to PrepareNamed
on line 08. Since the code wants to assign the returned *sqlx.NamedStmt
value to the Named
field of the received Query
value during the call, the short variable declaration operator can’t be used. This bubbles up a serious issue for me. The construction of a Query
value is not complete until this Build
method is called.
I don’t like seeing the construction/initialization of a value separated like this. The Query
value is obviously constructed, (in an incomplete state), prior to the method call to Build
. Seeing the Raw
field being passed into the PrepareNamed
call and the assignment to the Named
field provides validation of this disconnect. This is just riddled with potential problems and fraud.
There is something else, technically Build
is executing a single statement, the function call to PrepareNamed
. Why does the call to PrepareNamed
need to be abstracted at all? This method Build
is not providing a new semantic level or simplifying anything. I would argue it is taking away from readability. At this point, I question the entire existence of the Query
type and its construction/initialization semantic.
Listing 3
12 func initQF(db *sqlx.DB, qf func() *Query) error {
13 q := qf()
14 return q.Build(db)
15 }
There are more bad smells with the initQF
function. It seems this function is trying to abstract away the complexity of initializing a Query
value. What is troubling is a Query
value is not passed in or out of the function. It’s seems an existing Query
value is pulled into the function through a function call on line 13. To add more smells, the qf
function is defined and passed in by the caller. Talk about hiding information and cost!
There is no understanding of what the qf
function is doing and how it’s doing it. Why do we need a user defined function to create a Query
value? Nothing is passed into the call so it isn’t clear why initQF
can’t construct its own Query
value.
If you step back 10k feet, you can see that initQF
is only making a single call to Build
, which could be done by the caller directly. Now we see that initQF
is not necessary since Build
was not necessary. The caller can call PrepareNamed
on their own.
Listing 4
17 func defineQ(q *Query, raw string) *Query {
18 if q != nil {
19 return q
20 }
21 q = &Query{
22 Raw: raw,
23 }
24 return q
25 }
The declaration of the defineQ
function is bothering me. The API has the feel of a value semantic mutation API but it is using pointer semantics. What do I mean?
An example of a value semantic mutation API is the append
function.
Listing 5
slice := append(slice, "some value")
A slice value is passed into append
, then append
mutates its own copy and a new slice value is returned. When using value semantics this is a necessary API design. It maintains the semantics of immutability.
The Query
type has been using pointer semantics so there is nothing necessarily wrong with sharing a Query
value with the function. If that is the choice, then why is a Query
value also being shared on the return? Either you pass it in or construct and return. Then on line 21, you
see the function is constructing a new Query
value. This is a major inconsistency in semantics.
Listing 6
21 q = &Query{
22 Raw: raw,
23 }
24 return q
The API decision is resulting in an implementation that is sketchy. We will get back to the construction code in a second. Look at the first 3 lines of code first.
Listing 7
18 if q != nil {
19 return q
20 }
This really bothers me. It suggests the caller can call this factory function even if it already has a Query
value! Why would this ever be the correct thing to do? It means the caller is clueless about the state of the data they are working with. This code is here to prevent an accident because the API doesn’t prevent fraud. This if
statement should be the responsibility of the caller. In fact, the API should make it unnecessary to have this if
statement at all.
Now go back to the construction of the Query
value.
Listing 8
21 q = &Query{
22 Raw: raw,
23 }
24 return q
The use of pointer semantics for the partial construction of the Query
value really bothers me. I want to use value semantics on construction since construction doesn’t tell you where a value is in memory. However, because the caller is passing in q
and it’s a pointer, the code is forced to use pointer semantics on the construction and separate that from the return. All of this code is a sign that the function’s API is incorrect and any code calling it must also be questioned.
At this point the Query
type and the two Query
related functions initQF
and defineQ
seem sketchy and unnecessary.
Listing 9
27 type AddUser struct {
28 checkUserExists *Query
29 insertUser *Query
30 }
31
32 func NewAddUser(db *sqlx.DB) (*AddUser, error) {
33 var au AddUser
34 if err := initQF(db, au.CheckUserExists); err != nil {
35 return nil, err
36 }
37 if err := initQF(db, au.InsertUser); err != nil {
38 return nil, err
39 }
40 return &au, nil
41 }
The AddUser
type is one of the business related types that would be implemented for the application. This type defines the queries that must be run for adding a new user. In this case, checking that a user exists and then inserting that user in the database. The declaration of the AddUser
type on lines 27 through 30 looks harmless enough. The declaration of the factory function NewAddUser
on line 32 also looks reasonable.
The implementation of the NewAddUser
function however has a few bad smells. First, it’s not obvious that the AddUser
value constructed on line 33 is being initialized by the initQF
function calls on lines 34 and 37. All I know is a AddUser
value is constructed to its zero value state and that is being returned on line 40. The calls to initQF
do not make it clear that the Query
pointer fields inside of the au
variable are being initialized during those calls.
The biggest readability problem might be the use of the same name for both the fields and methods.
Listing 10
27 type AddUser struct {
28 checkUserExists *Query
29 insertUser *Query
30 }
45 func (au *AddUser) CheckUserExists() *Query {
46 return defineQ(au.checkUserExists,
47 `SELECT COUNT(*) FROM users ...`)
48 }
49
50 func (au *AddUser) InsertUser() *Query {
51 return defineQ(au.insertUser,
52 `INSERT INTO users ...`)
53 }
This is super confusing. When I first looked at the call to initQF
on line 34, I thought there was a mistake with the call. The field checkUserExists
is not of the right type to be passed into initQF
. The call is not using lowercase checkUserExists
but uppercase CheckUserExists
. This is not immediately obvious.
When we look at the CheckUserExists
and InsertUser
methods, I have the same problem I had before with the implementation of the Build
method. The implementation is nothing more than a single function call that doesn’t need to be encapsulated. The CheckUserExists
and InsertUser
methods are adding no value.
Anytime I don‘t see compact construction being leveraged when zero value is not valid, it’s an initial smell. It can mean that the initialization sequence is over complicated or over abstracted. This is what I believe we have here. To me, the NewAddUser
function is completely wrong and a result of bad API decisions that were made with the Query
type.
When your foundational types are sketchy, anything using them will be sketchy. Sarah Mei once said, “We think awful code is written by awful devs. But in reality, it’s written by reasonable devs in awful circumstances.” Here is a small example of how this can happen.
Interface pollution runs rampant in Go and what I see next is no exception.
Listing 11
55 type UserQuery interface {
56 CheckUserExists() Query
57 InsertUser() Query
58 }
59
60 func SignupUser(uq UserQuery) {
61 uq.CheckUserExists()
62 uq.InsertUser()
63 }
This code scares me on multiple levels. First, the UserQuery
interface is not defining a reusable contract. It is very specific to the AddUser
requirements. From what little I know, I don’t expect to see another implementation of this behavior. Second, the SignupUser
function is asking for concrete data based on the UserQuery
interface and not doing anything else but calling the methods.
Granted, this is prototype code and I expect this was not fully thought out when I got the code. But it shows what happens when you don’t have a working or clear concrete solution in place and you begin to think about contracts.
Possible Refactoring
Within several minutes we were able to start getting bad surface smells which identified deeper issues with the code. With the goal of prototyping a more precise solution for this problem, here is one possible refactoring.
Listing 12
01 type AddUser struct {
02 db *sqlx.DB
03 checkUserExists *sqlx.NamedStmt
04 insertUser *sqlx.NamedStmt
05 }
06
07 func NewAddUser(db *sqlx.DB) (*AddUser, error) {
08 query := "SELECT COUNT(*) FROM users ..."
09 cue, err := db.PrepareNamed(query)
10 if err != nil {
11 return nil, err
12 }
13
14 query = "INSERT INTO users ..."
15 iu, err := db.PrepareNamed(query)
16 if err != nil {
17 return nil, err
18 }
19
20 au := AddUser{
21 db: db,
22 checkUserExists: cue,
23 insertUser: iu,
24 }
25
26 return &au, nil
27 }
28
29 func (au *AddUser) Execute() error {
30 // DO STUFF HERE
31 }
The code in listing 12 accomplishes the same goals as the original code. This prototype is half the code size and reduces everything down to a single type, factory function and method. I would argue that the average Go developer can read this code and understand everything that is going on quickly.
The Query
type and all associated functions and methods have been removed. All that is left is the AddUser
type and it contains the database related fields it needs to perform the business logic. It contains a database connection and the two named statements for the database operations.
The factory function NewAddUser
on line 07 is simplified and construction is done in a single compact construction statement on line 20. Local variables are declared and used during the calls to PrepareNamed
. This allows for the use of standard Go idioms on error handling. Only if both calls succeed is there an AddUser
value constructed. Everything is cleaner and easier to read and reason about.
The Execute
method on line 29 serves two purposes. First, it contains the specific implementation code and business logic for adding a user. Each new type can do the same and it’s an obvious pattern. Second, it allows for an interface to be declared if it becomes reasonable and practical to have one. I don’t know how much common or boilerplate code is involved in executing this business logic. If there is enough, then an interface can be declared around this API.
Now let’s compare the new code changes to the four benefits of being precise:
Write less code
We reduced the lines of code in half and the code is more comprehensible.
Less possibility for misuse
It is not possible to misuse the factory function NewAddUser
or the Execute
method. Initialization is clear and performed in a single call. The construction of the AddUser
value on line 20 is done in one statement using compact construction.
Better defense against fraud
There is no way to cause the AddUser
value to be in a bad state. The call to NewAddUser
could be performed with a nil
value for the database connection, but this isn’t anything I would want to add code for because it represents an integrity issue. I want the code to panic if it happens since it’s not a valid option for calling the function. I wouldn’t add an if
statement for this scenario.
More accurate testing and debugging
Testing is now reduced and clear. If there are bugs in executing this business logic, there is one place to go to resolve it. This has helped to keep the mental model of the code clear.
Conclusion
The idea that developers should generalize code and create layers upon layers of abstraction so things don’t break as business requirements change creates more problems than it solves. I believe writing code with a focus on being precise lends itself to code that is more clear, consistent and efficient while minimizing, reducing and simplifying the code base.
You need to decide for yourself what your priorities are for the code you are writing and the tradeoffs you’re willing to take. It’s not possible to review code effectively without a clear understanding of the philosophies you will apply. For me, integrity, readability and simplicity rule everything.
Brian Kernighan once said,
“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”
This is about writing simple code that is easy to read and understand without the need of mental exhaustion. Just as important, it’s about not hiding the cost/impact of the code you write, be it per line, per function, per package or the overall ecosystem it runs in.