Tuesday, March 1, 2016

Big stack, little stack

Of the many discussions that I've been a part of in my career one of my most favorite is the discussion around AWS::CloudFormation.  Specifically when it comes to the question of which is better: big stacks with lots of things, or many smaller stacks each with fewer things.

I'm going to tackle each side of the argument, and then give what I think is a resonable conclusion to the question.

tl;dr: Big stacks are better.

Let's first start with the argument for more smaller stacks.  Usually the conversation starts by claiming something about smaller stacks being better because it's easier to change things down the road if something needs upgrading or changing.

Here is a use case that comes from one of the larger companies that I've worked for.  In this case I was working as a DevOps engineer managing five mobile services.  Each service had its own prod and pre-prod AWS account.  Each account was created and managed by the companies AWS security team.  The security team was responsible for ensuring that each service that ran in any of the AWS accounts ( including pre-prod things ) was adhering to the company wide security standards.

This team was also responsible for making sure that everyone was using their VPN rig, which include things like routes and security groups.  Each element ( VPN, route, gateway, NAT ) was configured as a single AWS::CloudFormation stack.

The thinking here was that the security team could change out components as needed.  So, for example, if at some point they needed to change a route, the would simply update the route stack and everything would be fine.

Good in theory, but we'll see how this plays out later on...

Now let's contrast that with the big stack model where we're putting the majority of our resources into a single stack.  We do separate the database from cache from the application itself, so we do end up with a few stacks.  However, the number of stacks with this model is noticeably different from small stack approach.

The security team would often argue that the problem with this approach is that if you update the application stack something might go wrong.  This is an unfortunate line of thinking as it is driven from a position of not understanding how things work when things are upgraded.  This happens to be far too common a thing in the world of cloud automation.

The argument usually comes down to the fear of not knowing what's going to happen when something is changed, so break everything up into smaller chunks so that a single update doesn't take down the body of the thing.  Fear based cloud automation should be treated like a crime.  What's worse is that the documentation for AWS::CF clearly, plainly and obviously points out exactly which conditions change resources, so everything in this domain is predictable.

The little stack approach, which was utilized by the security team, has a significant drawback in that it's moving the logic for what gets created by the end user away from AWS::CF.  When a stack is created the AWS::CF engine makes certain optimizations about what gets created and when and utilizes every aspect of its own internal API's to create the stack in the most optimal way possible.  That's not necessarily the fastest either.  Optimal could mean many different things for many different scenarios.

The little stack approach forces the end user to come up with their own version of this logic.  Now the end user has to create rules for what gets created and when.  As well as deal with optimizations like parallelization.

In my view, the entire reason that I like using AWS::CF is because I get to hand off all of the work for creating my infrastructure to my cloud vendor.  This means that I don't have to spend any time dealing with rules, or an engine for creating things.  I simply create my stack and fire away.

The little stack approach is dangerous and illuminates lack of knowledge for a particular domain of cloud automation.





No comments:

Post a Comment