Tuesday, March 1, 2016

GCP - Create or Aquire

Create or Acquire

Quite possibly the most painful and annoying feature of the GCP compute cloud.  This actually happened to us in our production environment.  This little feature ended up taking down our production stack for half an afternoon. Thankfully this was before we were fully launched as a product, so it was reasonably non-impacting.

I created a DM stack called “prod-gateway-0” which created a compute instance called “prod-gateway-0.”  This was very early on during our development of the DM bits so we were still learning how everything was working.  I had created all of the lower “gateway” stacks without any incident, so I had confidence that this wouldn’t be a problem in production.

prod-gateway-0 came up without a problem, but I noticed that I had made a mistake in the bootup sequence.  It wasn’t anything catastrophic, but I wanted to take the stack down, rebuild it and make sure it came up correctly the way I intended.  It’s important that everything come up properly so that we don’t have danglers or one-offs that might bite us down the road later on.

I deleted the stack and immediately noticed that prod had basically blown its brains out.  Upon investigation we realized that our old management system ( Ansible ) had created the original prod gateway instance named “prod-gateway-0.”  

I had assumed two things:

  1. Absolutely everything tied to stack is unique to the stack.  This is not true in the case of things that already exist.
  2. An error would be thrown if the stack tried to create something with the same name as something that already exists.

Neither of these assumptions are true when it comes to GCP-Compute.  Strangely enough, both points are true when it comes to disks.  Apparently this rule only applies to compute instances.

It’s difficult to understand the design of a system that would decide to take ownership of something that already exists, and what’s more, would remove that object if the stack is removed.  I would assume that if the stack took ownership of the object, it wouldn’t then delete the object since it wasn’t created by the stack.

Apparently GCP support seems to think otherwise.  It’s important to remember that things are not as unique as they seem in GCP-Compute land.

No comments:

Post a Comment