A Pool Checking Plugin For Nagios
Nagios is a popular network monitoring tool that works by running various plugin programs. I'm not going to talk about how to configure Nagios, so please refer to the website for its documentation. The Nagios/XenAPI plugin is a nice simple example of XenAPI programming in Python, which may actually be useful to anyone administering a network containing XenServer hosts.
Definition of problem
A Nagios plugin should be a program which can be run from the command line to check on the status of a 'service'. It communicates through its return value, which should be 0 if the service is OK, 1 for a warning, 2 if the service is broken, and 3 for unknown or unexpected errors. It may also print a single line of text to standard output.
We will say that a resource pool of XenServer hosts is OK if the pool master reports that all slaves are live. If any are dead then we should signal a problem.
Let's ignore the problem of passing in the server address, user name and password for the moment. The core of the program is: (fire up the python interpreter)
hostname, username, password = "ivory", "root", "password" # usual boilerplate login import XenAPI session=XenAPI.Session('https://'+hostname) session.login_with_password(username, password) sx=session.xenapi # partition the hosts according to whether they're alive or not hosts=sx.host.get_all() hosts_with_status=[(sx.host.get_name_label(x),sx.host_metrics.get_live( sx.host.get_metrics(x) )) for x in hosts] live_hosts=[name for (name,status) in hosts_with_status if (status==True)] dead_hosts=[name for (name,status) in hosts_with_status if not (status==True)] #our one line of output print "live hosts", live_hosts, "dead hosts", dead_hosts #retcode is the value we should return to the system retcode = 2 if (len(dead_hosts)<>0) else 0
Occasionally, the pool master can change. (You can force this to happen from the server command line with xe pool-designate-new-master). If you run the above code and ivory is not the master, then you'll get this error:
XenAPI.Failure: ['HOST_IS_SLAVE', '10.80.224.105']
So a better way of logging in is to catch this exception, and then use the address which is returned (which is the real master) to login instead.
hostname, username, password = "ivory", "root", "password"
except XenAPI.Failure, e:
This should automatically redirect the call to whatever host has taken over as master. However this is probably worth a warning to nagios, since the administrator will probably want to fix whatever the problem is and put everything back to normal.
Putting it all together:
Now we need to take the first program, add the new login method with the possible redirect, and wrap it all in some code to accept command line arguments and return values to the system (nagios). This is my version:
An example unix command line would be:
$ ./nagios_plugin.py -H ivory -p password ; echo $?
which will print the return code as well.