Mitigating infrastructure risk
When you are building a product on the internet, you should aim to get it out as soon as humanly possible. Things you don’t want to care that moment include the infrastructure you are running your app on. You want to get it out in front of people, hopefully with a payment form so they can pay for your product as soon as possible. I’m not talking about this situation here, you should start with something that gives you the most leverage and peace of mind while you are acquiring your valuable first customers. You definitely don’t want to fiddle with network policies and want an out of the box solution as much as possible. If you are capable of provisioning your own virtual instances, you could try out Kamal deployment which builds on top of the docker I’m gonna talk about later. Otherwise stick with some platform as a service solution like Heroku Some of the things you have to think about upfront are basic security and backup policies, you don’t want to lose your customers’ data, or even worse, get it leaked somewhere. After your product is established, you might want to think about the chance that the service provider you are using is/can go out of business. This happens a lot with companies over time, and you can never expect a provider to continue working indefinitely.
Of course AWS or Google or Azure won’t go down tomorrow, but having a good and resilient infrastructure strategy is something that can help you in the future. Maybe your company ends up acquired by a bigger player and you have to migrate your infrastructure to another provider. Maybe you have to migrate outside of the cloud altogether. Maybe you do something controversial and end up de-platformed and have to figure out a way to get back online as soon as possible. Let’s discuss some of the options we have here.
First things first, make the application artefact as portable as possible. My advice here would be to stick with Docker, do whatever you can to mangle your code into a docker image and run it from there. This way you are forced to “control” the app using environment variables, and since you can run the same docker container with the same outcome on any machine imaginable, you will get the same outcome. Like Java’s promise of write once run everywhere, but this time you don’t have to debug it on every platform you are running your app on. Using docker will force you to make a stateless image, completely controllable by environment variables. This approach doesn’t cost you anything upfront, but can come in handy later if you have to rebrand, add different regions or even offer an enterprise self hosted option of your product.
Okay, since most applications won’t run without a database, you need to put a database engine somewhere as well. Sqlite has become a production ready database, but if you want scalability and more powerful features, you can stick to the tested relational database systems. MySQL and PostgreSQL are amazing choices. You can run them in a docker container or run them as a service on all cloud providers, and they are almost infinitely scalable, if you know how to set up replication and/or clusters. Choose whatever you like and know more, all this technology works pretty much the same. The above goes for other supporting services, you might be using redis or some other key/value database for caching, search or managing background jobs.
Back to our resilience strategy, what can we do to make sure we can keep the lights on if a cloud provider we are using goes down?
- Off site database and file backups. Cloud file storage is really cheap nowadays, and there is no excuse not having the production database regularly synced to a backup location. Same goes for user uploaded files. This goes into the realm of Disaster Recovery, so be sure that your backups can be restored, unless you are using Super Simple Storage Service of course.
- Have the ability to build the docker image on any computer with access to the internet. Ideally you’d have multiple engineers in your company that are able to run a script and build the image. In my opinion, the image for any git SHA must be able to run on any environment, see the 12 factor app. Funny enough, this is the situation we have when we start building a project, and then we lose this capability along the way, adding different production only dependencies. Using a single image approach and controlling it with environment variables only keeps us in check.
If you have both of those things sorted out, restoring from a catastrophic event could take a few hours tops. Not that much in the grand scheme of things. Remember I wrote about getting acquired and having to move to another cloud provider. This thing happened while I was working for intuo, now Unit4. We (mostly I) had to move the whole infrastructure from AWS to Azure. What saved my ass in the end was Docker (and Kubernetes) which I leveraged to have an infrastructure independent deployment. Yes there were intricacies where S3 was different from Azure Blob storage and the network management is different, but in the end it’s more or less the same thing whatever you are using.
Keep in mind that you are renting someone else’s computer in the “cloud” and running your stuff on it. If you are big enough, you could be wasting a lot of money using one provider instead of another. You could even move off the cloud and run your own servers if you find that more cost effective. Having the ability to do both whenever you want gives you both the leverage over the cloud provider (so you can get the best rates) and freedom of mind to not worry about someone hiking up the rates without a valid reason. Changing infrastructure is going to cost you, but the savings can be huge, and no one can replace the piece of mind you have when you know you can do it on a whim over a weekend if you really want to.
Comments