Hack 61 Monitor AvailabilityUse Nagios to keep tabs on your network. Since remote exploits can often crash the service that is being broken into or cause its CPU use to skyrocket, you should monitor the services that are running on your network. Just looking for an open port (such as by using Nmap [Hack #42]) isn't enough. The machine may be able to respond to a TCP connect request, but the service may be unable to respond (or worse, could be replaced by a different program entirely!). One tool that can help you verify your services at a glance is Nagios (http://www.nagios.org). Nagios is a network-monitoring application that monitors not only the services running on the hosts on your network, but also the resources on each host, such as CPU usage, disk space, memory usage, running processes, log files, and much more. In the advent of a problem it can notify you through email, pager, or any other method that you define, and you can check the status of your network at a glace by using the web GUI. Nagios is also easily extensible through its plug-in API. To install Nagios, download the source distribution from the Nagios web site. Then, unpack the source distribution and go into the directory it creates: $ tar xfz nagios-1.1.tar.gz Before running Nagios's configure script, you should create a user and group for Nagios to run as (e.g., nagios). Then run the configure script with a command similar to this: $ ./configure --with-nagios-user=nagios --with-nagios-grp=nagios This will install Nagios in /usr/local/nagios. As usual, you can modify this behavior by using the --prefix switch. After the configure script finishes, compile Nagios by running make all. Then become root and run make install to install it. In addition, you can optionally install Nagios's initialization scripts by running make install-init. If you take a look into the /usr/local/nagios directory right now, you will see that there are four directories. The bin directory contains a single file, nagios, that is the core of the package. This application does the actual monitoring. The sbin directory contains the CGI scripts that will be used in the web-based interface. Inside the share directory, you'll find the HTML files and documentation. Finally, the var directory is where Nagios will store its information once it starts running. Before you can use Nagios, you will need a couple of configuration files. These files go into the etc directory, which will be created when you run make install-config. This command also creates a sample copy of each required configuration file and puts them into the etc directory. At this point the Nagios installation is complete. However, it is not very useful in its current state, because it lacks the actual monitoring applications. These applications, which check whether a particular monitored service is functioning properly, are called plug-ins. Nagios comes with a default set of plug-ins, but they must be downloaded and installed separately. Download the latest Nagios Plugins package and decompress it. You will need to run the provided configure script to prepare the package for compilation on your system. You will find that the plug-ins are installed in a fashion similar to the actual Nagios program. To compile the plug-ins, run commands similar to these: $ ./configure --prefix=/usr/local/nagios \ You might get notifications about missing programs or Perl modules while the script is running. These are mostly fine, unless you specifically need the mentioned applications to monitor a service. After compilation is finished, become root and run make install to install the plug-ins. The plug-ins will be installed in the libexec directory of your Nagios base directory (e.g., /usr/local/nagios/libexec). There are a few rules that all Nagios plug-ins should implement, making them suitable for use by Nagios. All plug-ins provide a --help option that displays information about the plug-in and how it works. This feature is very helpful when you're trying to monitor a new service using a plug-in you haven't used before. For instance, to learn how the check_ssh plug-in works, run the following command: $ /usr/local/nagios/libexec/check_ssh Now that both Nagios and the plug-ins are installed, we are almost ready to begin monitoring our servers. However, Nagios will not even start before it's configured properly. The sample configuration files provide a good starting point: $ cd /usr/local/nagios/etc Since these are sample files, the Nagios authors added a .cfg-sample suffix to each file. First, we need to copy or rename each one to end in .cfg, so that the software can use them properly. (If you don't change the configuration filenames, Nagios will not be able to find them.) You can either rename each file manually or use the following command to take care of them all at once. Type the following script on a single line: # for i in *cfg-sample; do mv $i `echo $i | \ First there is the main configuration file, nagios.cfg. You can pretty much leave everything as isthe Nagios installation process will make sure the file paths used in the configuration file are correct. There's one option, however, that you might want to change: check_external_commands, which is set to 0 by default. If you would like to be able to directly run commands through the web interface, you will want to set this to 1. Depending on your network environment, this may or may not be an acceptable security risk, as enabling this option will permit the execution of scripts from the web interface. Other options you need to set in cgi.cfg configure which usernames are allowed to run external commands. To get Nagios running, you must modify all but a few of the sample configuration files. Configuring Nagios to monitor your servers is not as difficult as it looks. To help you, you can use the verbose mode of the Nagios binary by running: # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg This command will go through the configuration files and report any errors. Start fixing the errors one by one, and run the command again to find the next error. For testing purposes, it is easiest to disable all hosts and services definitions in the sample configuration files and merely use the files as templates for your own hosts and services. You can keep most of the files as is, but remove the following, which will be created from scratch: hosts.cfg Start by configuring a host to monitor. We first need to add our host definition and configure some options for that host. You can add as many hosts as you like, but we will stick with one for the sake of simplicity. Here are the contents of hosts.cfg: # Generic host definition template The first host defined is not a real host but a template from which other host definitions are derived. This mechanism can be seen in other configuration files and makes configuration based on a predefined set of defaults a breeze. With this setup we are monitoring only one host, , to see if it is alive. The host_name parameter is important because other configuration files will refer to this server by this name. Now the host needs to be added to a hostgroup, so that the application knows which contact group to send notifications to. Here's what hostgroups.cfg looks like: define hostgroup{ This defines a new hostgroup and associates the flcd-admins contact_group with it. Now you'll need to define that contact group in contactgroups.cfg: define contactgroup{ Here the flcd-admins contact_group is defined with two members, oktay and verty. This configuration ensures that both users will be notified when something goes wrong with a server that flcd-admins is responsible for. The next step is to set the contact information and notification preferences for these users. Here are the definitions for those two members in contacts.cfg: define contact{ In addition to providing contact details for a particular user, the contact_name in the contacts.cfg file is also used by the CGI scripts (i.e., the web interface) to determine whether a particular user is allowed to access a particular resource. Now that your hosts and contacts are configured, you can start to configure monitoring for individual services on your server. This is done in services.cfg : # Generic service definition template This setup configures monitoring for two services. The first service definition, which has been called HTTP, will monitor whether the web server is up and will notify you if there's a problem. The second definition monitors the ping statistics from the server and notifies you if the response time or packet loss become too high. The commands used are check_http and check_ping, which were installed into the libexec directory during the plug-in installation. Please take your time to familiarize yourself with all other available plug-ins and configure them similarly to the previous example definitions. Once you're happy with your configuration, run Nagios with the -v switch one last time to make sure everything checks out. Then run it as a daemon by using the -d switch: # /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg That's all there is to it. Give Nagios a couple of minutes to generate some data, and then point your browser to the machine and look at the pretty service warning lights. |