I’ve been installing servers, deploying code to them, and exposing them to customers for most of my career. I started doing it by myself in the start. Those were windows servers, and the installers back then were impossible to script. I started building my script toolbox with the opportunity to install a Linux based server (Oracle database on SUSE). I did fiddle with Linux a bit before that. Even running one of the oldest (and hardest) Linux distributions; Slackware.
For a long time, I used bash to script the provisioning of my servers. It was handy, convenient and it worked. They were rarely idempotent. Since the products I worked on never had drastic loads hitting them, I never had a reason to migrate from bash. Also I was vigilant to allow at least 5x of regular traffic on the applications at all times. I did try chef, puppet, and Ansible, but since I didn’t have the need for them, I never picked up any of them. When I joined intuo, the stuff was already set up in Ansible. Since I was the only person interested in dev-ops and servers, I picked up that part from our CTO. I learned how Ansible works, rewrote the complete deployment strategy and things were running strong for years.
But there was something that was bugging me for a long time. The system wasn’t scalable enough. Although I would increase the computing power behind our app whenever a new big customer was being onboarded. And then tweaked it down when I got some real usage numbers, it was always done on feel. If we experience 50x more load than usual and have to scale the system it would take a few hours to scale horizontally. Vertical scaling means that you actually have to scale down for a few minutes until one instance reboots with a more powerful config. Both have their drawbacks, and both were used at certain points.
One more thing was that it’s always a few different instances running behind a load balancer. We manage everything with ansible, which should assure that at least the stuff we configure is the same on all instances, but that is not the case all the time. The security updates are ran overnight on those instances and small discrepancies might creep in from time to time.
This is why I wanted to go for a more modular, stateless approach using Docker. We tried it in development a few years ago, and because we couldn’t get ember.js to run fast enough (slugishly slow) inside a docker container, we gave up on the idea. A few iterations (and giving up) later, I decided to go through with it, and it finally worked.
The thing I love about docker is that the container is in a fixed state where the application was at a certain time and place. You have the ability to version tag them, you are able to go back and forward through history with them. In the code sense, you can do the same with git, it’s easy to rollback to a certain code version. But a deployed system isn’t only the code it runs. It’s including the OS, all the libraries on the OS that are supporting the code. Sometimes release security upgrades that break your app without knowing.
Automatic security updates are mandatory, but I remember one instance where redhat decided to change the nginx SELinux rules in a kernel update that installed overnight. That thing caused a bit of downtime (and a lot of fiddling before we were able to enable SELinux again). With docker, you can test that everything works before deploying it, using a canary deployment approach, and rolling it back if something fails. This way your app can be more resilient and stable. I love open source and I wouldn’t be at this point of my career if it didn’t exist, but sometimes it’s a nightmare. The community is there to fix the issues if they arise, but it can take a few days to get back on track. Sometimes you don’t have the skills to fix some open-source library. Being able to run the working version of the system during those times is crucial, while you wait for the fix. I don’t like downtimes for some weird reason.
I thought a lot about using ECS (Elastic Container service), where I could get a similar containerised app approach with less “buy in”. But in the end it would mean that I have to learn another infrastructure (which wasn’t a problem by itself, but Kubernetes looks like the future to me). Diving head first into k8s on AWS was quite an adventure. I plan to dive deeper into the architecture used, and the choices that I made. This is still (and will remain) a living creature, where we improve the setup as soon as we gain new experiences. Like all software, you have to improve infrastructure all the time, in search for the mythical reliable system.