Go and SQL: Pitfalls With Existing Libraries (and a Better Solution)

Meet KSQL — A library I wrote to address the problems of Golang and SQL

Vinícius Garcia
Better Programming

--

Photo by Chinmay Bhattar on Unsplash

In my years working with Golang, I came to the conclusion that working with SQL is not a well-solved problem yet. The libraries that we have available are just not easy to use and there are too many ways of shooting your own foot.

To illustrate that I will try to describe in this article all these problems in detail and then briefly present the solution that I came up with for solving these issues: a library that I wrote called KSQL .

In the next sections we’ll discuss three widely adopted libraries in the Golang ecosystem:

  • The standard library: database/sql
  • A wrapper of the standard library that adds some cool features like struct scan: sqlx
  • A specialized driver for Postgres that is faster than the alternatives and that can be used together with database/sql or sqlx: pgx

Problem 1: Checking errors 4 times for a single query

This problem is most apparent when writing queries for reading multiple rows from the database.

When doing that we actually have to test for errors 4 times using database/sql or sqlx and 3 times for pgx.

And the scary part about this is that most people don’t even know this, and usually only test to see if the call to Query() was successful and if each call to .Scan() was successful.

So if we would actually check for all of these errors our code would end up looking like this:

Note that when using pgx the .Close() function actually doesn’t return an error so you would have to test it 3 times, not 4.

And if you have ever used any of these libraries I bet you don’t test all 4 of these errors, and it is usually fine.

But I would still prefer that the library did not force us to make this decision between a very verbose error checking and ignoring errors that will happen very rarely.

Problem 2: Forgetting to call defer rows.Close()

So as shown in the example above we do have to call defer rows.Close(). If we don’t call it we’ll end up with a very serious error, our connections will just never be released making our code stop eventually with no error messages after some time executing.

Even worse, if you do that on a single query in your whole system it might still pass all your integration tests, and even your code review because it's not all that easy to spot a missing line and this might then end up causing problems in production.

Problem 3: Calling defer rows.Close() at the wrong time

Even if you never forget to call rows.Close() there is still another problem that can easily pass unnoticed on a code review and integration tests. This problem is illustrated in the code sample below:

In this last example, many readers might not have identified the problem, and if you did I bet someone on your team could have missed it.

If you run this code it will work in production most of the time but it will panic sometimes. In other words one of the worse types of problems to debug: it only happens sometimes. So what is going on over there?

The problem lies in where we are calling rows.Close() i.e. before checking the error, not after, so if the call to db.Query(...) returns an error then the rows variable will be nil, which means that the call to defer rows.Close() will be trying to call a method on a nil value and it will just panic at the end of the function when defers are executed.

Again what makes this problem particularly alarming is that it can slip through a code review and many types of integration tests easily.

Problem 4: Code duplication with big structs

This is probably a very familiar situation: We have a struct, let’s say a User struct and we need to list all the attributes of this struct both in the query and in the arguments passed to .Scan() every time we make any query that reads users.

To illustrate this suppose that one of these queries looked like this:

In this small example, it doesn’t look too bad, but it can get really bad if you have structs with more than 50 attributes. And even worse if we have let’s say 5 to 7 queries scanning into each of these structs.

I hope I was able to paint how bad this problem can get in big code bases.

But before continuing: it is important to mention that when using sqlx you are not forced to use the .Scan() function you can use the .StructScan() function instead, so this example would be improved by 50%.

There is also a great library that also has a StructScan() feature called Scany. and it can be used on top of database/sql and pgx for extending these two libraries with the StructScan() feature.

But still. Both of these solutions would still require you to list all the names of the attributes of your struct in every query, so you don’t really get rid of all code duplication.

So if none of the solutions mentioned above are used, these are the main consequences of this problem:

  1. A very tight coupling between the order of the attributes in the SELECT part of the query and the order of attributes passed to .Scan(). Since if the order of these two lists does not match you might get unexpected scan errors or worse: a silent error where you load your variables with incorrect values but the program continues to run. This is a particularly bad problem because it can easily slip through a code review process, so always having integration tests is recommended.
  2. Adding or removing attributes from the struct will likely force you to update all these values in all the queries which is a delicate operation because of the ordering problem mentioned above.
  3. And also a minor problem of readability: big chunks of code with trivial information are not helpful when you are trying to find particular pieces of logic in your code.

So is it even possible to have a better SQL library?

Yes, it is. I wrote this very cool library called KSQL. And I believe it has very good solutions to the problems mentioned above.

Since explaining the reasoning and all the features KSQL has is not the purpose of this post I will try to be brief and restrict this demonstration to a single example of how KSQL makes it easier to deal with those problems adding very little complexity to top of it.

The simple example below will query several values in a struct slice, and it will use the struct tags to:

  1. Create the SELECT part of the query for you (don’t worry this is an optional feature, and it is cached so it is very efficient, see our benchmarks for comparison).
  2. Map which column should be scanned into which attribute of the struct (again the reflection part is done only once and cached, so it's efficient, see our benchmarks for comparison).

Ok so now you can probably see that this is already a lot simpler than what we had with any of the other libraries, but let's get into more details about each of the problems mentioned before:

Solution to Problem 1: Too much error handling

The KSQL API was designed with the idea of only returning after all operations are done, which means that it only returns one error, making sure that all the possible sources of error are checked internally including the calls to rows.Err(), .Scan() and .Close() .

Solution to Problem 2 and 3: Calling defer rows.Close()

There is no need to call rows.Close().

Exactly because the function only returns after finishing the whole process we can call it internally and save the user from having to deal with this extra complexity.

Solution to Problem 4: Unnecessary duplication of attribute names

To eliminate this problem of duplication of names KSQL has two features:

  1. It will optionally take care of creating the SELECT part of the query for you, this behavior is only triggered if your query starts with FROM so in situations where you want a more elaborate expression in the SELECTpart of the query just write it and it will work as expected.
  2. It will automatically map any columns returned in the query into your struct by the names in the tags.

If you know how complicated dealing with SQL can be you are probably still skeptical and with reason.

For example, what will happen if we load more users than we can fit in memory?

Or what if we are losing a lot of efficiency to gain these abstractions?

KSQL does address these issues and until I have an article focused on explaining the KSQL design decisions in more detail I invite you to read the KSQL README file.

Conclusion

KSQL is a new idea so it is a good thing to be skeptical of its features, efficiency, test coverage, and stability (not that I am not addressing these issues too), but it does tackle a few serious problems that we see on some of the most used Go libraries for SQL.

It is also worth noting that KSQL is not an ORM, these abstractions were kept simple and limited exactly to avoid the common ORM pitfalls, this is just a smart replacement for libraries such as sqlx and pgx with as little query building and efficiency loss as possible.

In our benchmarks, it is easy to see that we perform the same as these other libraries in practical terms, and by that I mean that benchmark values vary a lot on each run, so it’s hard to be precise, but I would speculate that KSQL is 1% slower than pgx on Postgres and sqlx on all other databases.

--

--