While Amazon CloudWatch is the starting point for many administrators, a cottage industry of cloud-based monitoring tools has spring up. Nagios is perhaps the most famous of them all.
Though many other services and solutions are also available, each with its own strengths and weaknesses, Nagios is the most well-known of them all. It predates AWS significantly — this popular open source monitoring tool has been around since 1999, seven years before Amazon introduced EC2.
The developers designed it for server monitoring back when cloud was just a buzzword!
The Core variant of Nagios, available for free, may not be anything fancy, but it gets the job done. That said, the biggest advantage that Nagios has over other mature monitoring tools, is its surrounding ecosystem.
A wide array of solutions is available that provide access to features like graphing, reporting and integrations with third-party services. Using these tools, you can build a replacement for CloudWatch that may actually be more suited for your specific needs, and allow you to work around some of the limitations of the Amazon solution.
However, before you finalize your decision of going with Nagios for monitoring your EC2 instances, do be in the know that there is no such thing as a free lunch — at least not in the high-stakes world of cloud computing. The time it takes to implement a custom monitoring solution will be time taken away from building your core cloud application infrastructure.
But if you do go this route, then the Nagios Exchange is the repository that houses the plugins that can be used to integrate Nagios and AWS. There are plugins that directly monitor your EC2 instances and provide detailed status, and there are plugins that query CloudWatch and other Amazon Web Services components to provide a variety of different ways for you to monitor your AWS infrastructure.
Currently, Nagios provides functionality to monitor network services like HTTP, FTP, SMTP, POP3, and SSH, as well as host resources like processor load and disk usage. This is accomplished by the use of monitoring agents.
Monitoring is also possible via remotely run scripts, and this networking monitoring system has the ability to send contact notifications when service or host problems occur via several user-defined methods like email, pager, and SMS.
Worth a mention here that any tool that pulls data from CloudWatch will do so by going through the CloudWatch API. Pulling this information too frequently will have a direct effect on your AWS bill, leading to higher costs.
There are a couple of different ways to implement dynamic tools like Nagios with your AWS infrastructure. The first one is to use a configuration tool like Puppet, which allows the configuration of one node to influence the configuration of another node. Support for Nagios is explicitly built into the exported resources feature of Puppet that is used to implement monitoring.
The second way of achieving this integration is by using a custom script to query the AWS API and write Nagios configuration files based on the retrieved data. The downside to this approach is when instances get terminated as part of Auto Scaling operations. Nagios must then be informed that that particular instance no longer exists and should not be monitored.
Both approaches have their merits. How you implement Nagios will have to be weighed against the scale and scope of your cloud, and whether you actually need these third-party tools. You always have the option of learning to love CloudWatch, limitations and all.