Thursday, August 5, 2010

Why I love Zenoss

In the kitchen of the sysadmin you'll generally see some mix of tools like Nagios, Cacti, collectd, maybe even some custom stuff in there as well. In most cases people are looking to fulfill two core requirements: monitoring of services, and trend analytics.

In the past I've found that Nagios + Cacti was a fantastic mix to satisfy these requirements, however, I have recently found that Zenoss to be a much more satisfying tool.

The problem with Nagios is that you end up with lots of configuration files. Granted, most good admins will have these organized in a way that makes sense which ultimately makes them easy to manage and maintain. However, with Zenoss you have no configuration files ( at least not in the sense of host/services/etc... ). This is nice since I don't have to restart the service if I add a host, nor do I end up having to edit anything on the server itself. Tracking changes via svn/git is nice and all, but having all of the change log information in the interface is even better.

As for Cacti, I've found it to be rather prickly to get setup, and doesn't seem to work all that well in large scale environments.

Zenoss combines both of these tools into one tool and adds some very nice polish to the entire process. For example, if I want to ensure that Zenoss is monitoring any service on any host that matches: /^thin.*[0-9]{4}$/ ( thin server port 6900 ), I can add a service rule. This service rule then watches the process table on each host and will 'catch' any process matching my regex.

This has several benefits:
* The monitoring of the process is automatically picked up, if it crashes, an alert will be sent out.
* Along with the state monitoring, Zenoss will also start profiling this process as far as memory and cpu usage.
* I created one object, and that object was automatically propagated to all hosts.

The last point there is the most important. If I had a mix of hosts and services I can still get the trending and monitoring regardless of the role for any given host.

For example, let's say you have a mix of memcache instances, some that run on m1.small instances in AWS, and some that run as bare metal in a datacenter. In this case, as long as Zenoss can touch the snmp port on all instances, it can watch for any service matching something like /^memcache/. Again, regardless of the role for the given memcache instance, it'll be picked up and monitored automatically without you having to configure anything beyond the initial service.

Once I have this item configured and running, I can 'lock' the service, and any changes made from that point on are tracked by the system. So, if someone ( perhaps a Jr. Admin ) goes and fat fingers my regex, I'll know who did it and when. This is slightly more convenient then having to dig through commit logs.

Nginx + Sinatra + MongoDB

I created a document to help explain the how and why of my web app setup.

Document

I use this in production, so I guess you could say that I eat my own dog food on this one.