4 minute read

When I say databases, I think about relational database systems. It doesn’t matter which flavour of relational database you choose. They are all fine and strong enough to do what you need them to do: store and manipulate data. Yes, I said manipulate.

During the previous decade or two, the development of new frameworks and ORMs (object relational mapper, wiki LINK) made us treat databases as “bags to store data in”. Most tutorials are one-dimensional, wanting to put all business logic in the same place (your backend/api layer). Sometimes this works fine, sometimes it is slower than a pack of snails. This slowdown doesn’t happen immediately, and we usually figure it out only when it is too late to revert poor decisions. You write a function that processes some simple data and stores the result in the database while also displaying it to the user. Fifty change requests and 20 tax law changes later, we have 500 lines of code, doing nested loops through data to calculate the tax amount. And the function that completed the job in a millisecond now takes 20 minutes to run.

During my first consulting gig for a big bank, I was tasked to reduce the running time of a certain procedure, which sometimes took more than 24 hours (which meant the reporting replica would get recreated and the procedure connection cut off). It was an old Oracle client-server app and all the code was written on the client side. The biggest irony here was that the Oracle Forms LINK tool we were using had the same programming language as the underlying database. It was a very logical step to move the procedure to the database and run it there with a couple of parameters. Just that step reduced the running time to 11 minutes. No other optimisations were done, just the action of moving a chunk (huge one) of code from the client directly to the database. With additional optimisations and some refactoring, we got the whole procedure to run under 3 minutes.

Later on I fell under the influence (hating Oracle and everything related didn’t help) of ORMs and frameworks that handled all of the database related things. Imagine going from writing stored procedures and packages one day, and then not touching SQL for a few years. Don’t get me wrong, those things are great, handling a lot of underlying complexity and security, so you don’t have to think about them. But sometimes there comes a time when you just can’t optimise certain things. Those times call for certain database skills.

I also fell in love with Ruby on Rails being database agnostic, where you could easily work on multiple database systems in different environments. Think about having sqlite for your development and test environments, and Postgres as your deployment environment. This was a good scenario if your app was running on Heroku for example. You didn’t have to have a production db engine running locally. Although I’m a great proponent of developing in the same environment as you are deploying on, this is not possible to do all the time.

While various ORMs abstracted the SQL away from developers, it (SQL) became some esoteric super language (it isn’t) that was reserved for wizards (it’s not). SQL is super simple and easy to learn. What it allows us is manipulating the data at the source. The same piece of code completed in 11 minutes on the database compared to 23-24 hours (when it didn’t break) on the client. When you are writing your data processing on the database itself, you can think less about N+1 issues or memory bloats because you are instantiating every record under the sun to sum a few columns on them.

I have been using only PostgreSQL in the last decade, and the power it gives you is amazing. You can construct and return complex query results using common table expressions, return them in JSON format and then serve those responses as one of your api endpoints. You can serialise a json representation of an object (with relations) into a column for display/reporting purposes, something I did a couple of years ago to boost performance of a complex page. The possibilities it gives you (other SQL databases have similar features) are endless, and the documentation is great to find the solution for most of your problems.

As an end note I want to say that we should not be so strictly focused on “the way things should be done” in a given framework. It’s okay to follow the “best use scenario” for a while and sometimes you can solve pretty complex problems using “just the tools provided”. But there comes a time when the performance starts degrading or the queries become too complex to understand. It makes little sense to start returning JSON from a database when you have 5 tables and 7 users of your application. Use common sense and optimise only when it’s necessary, but optimise at the correct place. The database is a huge tool in your toolset, and it doesn’t deserve to be used “just as a data bag”. You can go a long way without SQL, but you’ll reach your destination faster (and easier) if you accept that your database is not just a stupid data storage, but also a powerful (and often underused) tool in your tool belt.

Thanks to Judith for helping me improve this article.

Comments