Tools for debugging and development on XenServer

Tools

Bonnie++

This is an open source project which helps measure and investigate network load and includes tools to investigate large file IO and creation/deletion of small files. There are lots of user guides and blogs on this tool.

CPU-Z

CPU-Z can be useful particularly if considering: How to investigate and use Turbo mode, C-States and P-States in XenServer

IOMeter

IOMeter is an open source tool, an I/O subsystem measurement and characterization tool for single and clustered systems. IOMeter is an easy way to generate stress on the I/O system and as such can be very useful within development test of products interacting with or generating load on the OVS.

iperf

iperf is an open source utility that can be a very useful for diagnosing network issues in a XenServer environment. There are a large number of how-to-guides and introductory tutorials available such as here.

OProfile

OProfile is an open source tool available from http://oprofile.sourceforge.net/news/. A Xen specific varient is currently shipped in versions XS6.1 and upwards. It is detailed here: http://xenoprof.sourceforge.net/xenoprof_2.0.txt

thc-ipv6

This is an IPv6 protocol attack suite for Linux from www.thc.org/thc-ipv6, it can generate many varieties of malicious and corrupt packets that will allow vendor developers to assess the robustness of their solutions.

vhd-util

vhd-util is an unsupported tool shipped with XenServer adn as such should never be used as an "API" around which to construct an application relying on its provision or stability of results. However it is very useful as a tool for working with VHDs and snapshots. It can be used to check, display and understand VHD files including snapshot chains. There is limited documentation and you will probably need to refer to the command line help by typing "vhd-util" at the XenServer command line and then requesting help for the desired option e.g.

[root@dot56 ~]# vhd-util
usage: vhd-util COMMAND [OPTIONS]
COMMAND := { create | snapshot | query | read | set | repair | resize | fill | coalesce | modify | scan | check | revert }
[root@dot56 ~]# vhd-util check -h
options: -n <file> [-i ignore missing primary footers] [-I ignore parent uuids] [-t ignore timestamps] [-p check parents] [-b check bitmaps] [-s stats] [-h help]
[root@dot56 ~]#

Some further information is available:

WinDbg

WinDbg is one of a number of tools available for Debugging Windows Guests on XenServer

xentop

xentop displays real-time information about a Xen system. It is shipped with xen tools. This Citrix Support article details its use further: http://support.citrix.com/article/CTX127896.

xl

The utility xl is actually part of the upstream Xen hypervisor developed as Open Source, as such the utility isn’t maintained by Citrix in XenServer but by Xen.org.

If find issues or have enquiries they need to be addressed to Xen via their mailing list process they will investigate; this page details how to do that and also the reports they specifically like to investigate “xl”: http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen.

It can help to reference similar issues or post to them as you might access an expert quickly via existing threads: http://lists.xen.org/archives/html/xen-devel/2009-11/msg00885.html

We generally would not recommend that a XenServer developer use xl routinely to configure test cases or similar as it affects only part of the toolstack and as such the XenAPI and XAPI will be unaware of changes that can lead to a very confused toolstack and some rather strange effects.

xl can however be of some use for debug and diagnosis, particularly the inquiry options such as info [-n, --numa], by which you can query hardware information such as cores_per_socket and threads_per_core and similar data that you might want to log or keep in benchmarking assessments.

Benchmarking Software

Documentation on performance analysis and tools 

Feedback

If you think there is a tool that should be added to this page, or have additional information or comments regarding the tools already listed please do contribute your experience to the discussion thread, here; and we will endeavour to incorporate your feedback.

 

 

 

 

 

 

 

 

 

whose 'xenoprof' variant is available by default in tampa, clearwater and trunk:
http://xenoprof.sourceforge.net/xenoprof_2.0.txt

 

 

 

Advice for developers on login to XenServer and debug of connections to XenServer

session.login_with_password

You should always normally use session.login_with_password to log into a XenServer host. If you try to connect to a slave you should receive a HOST_IS_SLAVE error code, however you should also be able to parse the address of the true master from the other supplementary (error description) returned information, you can see many examples of this in C# within the XenCenter Code, such as:

            if (error is Failure)
            {
                Failure f = (Failure)error;
                if (f.ErrorDescription[0] == Failure.HOST_IS_SLAVE)
                {
                    string oldHost = connection.Name;
                    ((XenConnection)connection).Hostname = f.ErrorDescription[1];
 

The method session.slave_local_login_with_password is provided as API for use in failure situations to login to a slave when the master host has truly gone down. It should not be used to routinely login into slave hosts and the resulting sessions are only good for use on that particular host.

 

Best practice session management and debugging session leaks

The session limit of XenAPI process (XAPI) is 400. When the limit is exceeded, the oldest session is terminated (that is the session that has not been used the longest that is expired). The oldest session might be active and in use. When the session is terminated, the client using that session gets disconnected without notification. The session is expired, and therefore the session ID can no longer be used to make XenAPI calls. You then need to login again. This may affect you even when you don’t have a connection open at the time, because session IDs can be reuse across TCP connections. Inactive sessions are terminated after 24 hours.

Note: Clients can be any of the following: slaves in the pool, users running XenCenter, third-party applications that leverage XAPI, internal Dom0 processes, cloud management applications, distributed desktop controllers, or any systems interacting with the pool through XAPI.

A session is created when a client logs on to the server using the session.login_with_password call. The client gets a session ID to make API calls. Once an application is finished interacting with a XenServer Host it is good practice to call session.logout. This invalidates the session reference (so it cannot be used in subsequent API calls) and simultaneously de-allocates server-side memory used to store the session object.

Although inactive sessions will timeout eventually (24 hours), the server has a hardcoded limit of 400 concurrent sessions. Once this limit has been reached fresh logins will evict the oldest session objects, causing their associated session references to become invalid. It is essential that application developers ensure they release redundant sessions accessing the server. The best policy is to create a single session at start-of-day, use this throughout the applications lifetime(note that sessions can be used across multiple separate client-server network connections ) and then explicitly logout when possible. Developers should take care to handle session release during exception handling and application failure where appropriate.

More details on the consequences of leaking sessions and how to detect this is the case using XenServer logs is detailed in http://support.citrix.com/article/CTX137488.

An overview of sessions can also be found in Chapter 3 of the XenServer SDK guide.

Debugging Session Leaks

Many session issues are detected by Citrix's free TaaS log analysis service. In the first instance we would suggest developers who suspect a session leak follow the advice on Server Status Report collection and run the logs through the auto-analysis tool, following the advice here

Debugging Session Leaks - Example Developer Script

A script written internally by one of our developers debugging a session issues in a third party application is available on XenServer 6.1 is available as is, here. The script gives an overview of all sessions that were created but not destroyed by the caller (in the timespan covered by the logs). It also shows whether the XAPI Garbage Collection (GC) process destroyed the session or not (due to exceeding the limit of 400 sessions, or because the session was not used for 24 hours). It also shows the user name used when logging in, and whether the session was created within dom0 on a unix domain socket (UNIX) or over a TCP connection (INET). In the latter case, the user-agent it listed as well. Note that the user name in this case is not relevant for authentication, but can in some cases be used to isolate what the application that created the session was.

 

Connection Problems

Many connection problems can arise if Network Time Protocol (NTP) synchronisation is wrong. It can often be worth checking your host synchronisation. This forum thread provides a lot of helpful advice on how to do this.

 

Client side timeout

This is not something we would wish to do and we do not encourage third parties to attempt to implement such an algorithm. Some tasks can take a long time or in the case of a genuine server or network issue be temporarily blocked, if a client simply drops the connection and attempts to re-perform the original call to the server this can in fact compound issues. It is very likely that the original call is still active server-side, further repeated calls to perform the same action are likely to cause more problems rather than solve any.

Instead we recommend that developers include logic in their to launch asynchronous XAPI calls in another thread, poll the task and then call the XAPI Task.cancel function if the initial Task has definitely stalled or the server has failed, and only as a last resort cut off the connection. Some advice on asynchronous API calls and task polling can be found, here.

 

Commonly used ports in XenServer and other Citrix Products

 

 

 

 

 

 

 

 

 

 

 

 

Setting Up a Test or Teaching Environment For XenServer

XenServer on XenServer

This is not a usage formally supported by Citrix, however it is possible (with limitations) to use it in some instances. It is not advised for use in a production environment but many developers find it useful especially if they have limited access to hardware.

Use cases are typically around providing training or demo environments, providing the ability to provision virtualised XenServer hosts.

Running XenServer on top of XenServer isn't a supported or sensible production system. However it is something developers writing products or scripts against the API sometimes do as a test environment. Blow away some XenServer VMs with your coding bugs rather than hosts.This is mentioned in the XenServer release notes: http://support.citrix.com/article/CTX134582
"As an alternative sandbox testing environment, you can install XenServer as a generic HVM Guest using the Other install media template (2048MB of memory and a disk size of at least 12GB is recommended). Note that the Guest's IP address will not be reported though the CLI or XenCenter. "

Please note that because “Other Install Media” is an HVM install the performance will correspondingly be unsuitable for production systems. Some users use this as part of their hardware compatibility testing.

One of my colleagues has explored scripting this type of installation on his blog: http://blogs.citrix.com/2013/03/18/virtual-hypervisor/

 

Checking Which Version Of XenServer Or Its Components Are Installed

If you are testing with multiple versions or frequently re-installing XenServer it may be useful to familiarise yourself with the CLI xe command:

host-param-get uuid=<host-uuid> param-name=software-version

 

which will report the version of all components of a XenServer installation.

Install / Upgrade and RollBack of Test Environments

Some developer tools intended for use in test environments (non-production) are available, here.

Testing A Product For Commercial Supported Citrix XenServer

Many developers and vendors create tools and products that they subsequently seek Citrix Certification for. Developers aspiring to official Citrix partner status for their products are encouraged to familiarise themselves with the certification criteria and kits http://www.citrix.co.uk/partner-programs/citrix-ready/test.html and to restrict themselves to fully supported architectural designs and interfaces. Access to Citrix’s partnering program http://www.citrix.co.uk/partner-programs/citrix-ready.html is free for XenServer Developers and Vendors. 

Demo Licences (formally known as NFR licenses)

XenServer is now open source and as such developers have free access to the functionality, source code and builds. Commercially supported Citrix XenServer licensing offers some usability tools in particular associated with upgrades and hotfix application. Developers wishing to access demo licenses of commercially support XenServer can do so by joining the Citrix Ready Partnering Program.

Demo Licenses for Citrix Products

Developers producing products for XenServer as a Platform for products such as Citrix CloudPlatform or XenDesktop and for use in mobility solutions can obtain free demo licenses for test and development via the Citrix Ready Partner program. Licenses are available for

  • Citrix XenDesktop (with XenApp)
  • Citrix VDI-in-a-Box
  • Citrix XenServer
  • Citrix XenServer Per Socket
  • Citrix Access Gateway VPX
  • Citrix NetScaler VPX
  • Citrix Access Gateway Universal
  • Citrix AppDNA Enterprise
  • Cloud Gateway Enterprise

 

Virtual Test Labs and Infrastructure

Vendors with limited experience or infrastructure for the testing of virtualised solutions may wish to explore using third party provision including Citrix Ready’s virtual test labs or consultancy solutions. Details of the can be found on the Citrix Site http://www.citrix.co.uk/partner-programs/citrix-ready/test.html, Citrix Ready Virtual Lab is an online lab that makes it simple for Citrix ecosystem partners to verify their products for the Citrix Ready program. The lab environment is preconfigured according to the requirements of select Citrix Ready verifications and contains the servers that most ISV partners require to verify their products.

 

Community Experience

The SDK and General XenServer User forums are frequented by many developers and XenServer users with experience of testing and using products for XenServer and can be a useful place to ask questions about testing products writeen against the APIs. Vendors and developers often recruit beta testers on the forums and many forum users welcome the chance to give developers feedback whilst products are in development.

 

 

A Pool Checking Plugin For Nagios

Nagios is a popular network monitoring tool that works by running various plugin programs.  I'm not going to talk about how to configure Nagios, so please refer to the website for its documentation.  The Nagios/XenAPI plugin is a nice simple example of XenAPI programming in Python, which may actually be useful to anyone administering a network containing XenServer hosts.

Definition of problem

A Nagios plugin should be a program which can be run from the command line to check on the status of a 'service'. It communicates through its return value, which should be 0 if the service is OK, 1 for a warning, 2 if the service is broken, and 3 for unknown or unexpected errors. It may also print a single line of text to standard output.

We will say that a resource pool of XenServer hosts is OK if the pool master reports that all slaves are live. If any are dead then we should signal a problem.

First try

Let's ignore the problem of passing in the server address, user name and password for the moment. The core of the program is: (fire up the python interpreter)

hostname, username, password = "ivory", "root", "password"

# usual boilerplate login

import XenAPI
session=XenAPI.Session('https://'+hostname)
session.login_with_password(username, password)
sx=session.xenapi
 
# partition the hosts according to whether they're alive or not
hosts=sx.host.get_all()
hosts_with_status=[(sx.host.get_name_label(x),sx.host_metrics.get_live( sx.host.get_metrics(x) )) for x in hosts]
live_hosts=[name for (name,status) in hosts_with_status if (status==True)]
dead_hosts=[name for (name,status) in hosts_with_status if not (status==True)]

 
#our one line of output
print "live hosts", live_hosts, "dead hosts", dead_hosts
#retcode is the value we should return to the system
retcode = 2 if (len(dead_hosts)<>0) else 0

A problem:

Occasionally, the pool master can change. (You can force this to happen from the server command line with xe pool-designate-new-master). If you run the above code and ivory is not the master, then you'll get this error:

XenAPI.Failure: ['HOST_IS_SLAVE', '10.80.224.105']

So a better way of logging in is to catch this exception, and then use the address which is returned (which is the real master) to login instead.

hostname, username, password = "ivory", "root", "password"
 
import XenAPI
 
try:
    session=XenAPI.Session('https://'+hostname)
    session.login_with_password(username, password)
except XenAPI.Failure, e:
    if e.details[0]=='HOST_IS_SLAVE':
        session=XenAPI.Session('https://'+e.details[1])
        session.login_with_password(username, password)
    else:
        raise

sx=session.xenapi

This should automatically redirect the call to whatever host has taken over as master. However this is probably worth a warning to nagios, since the administrator will probably want to fix whatever the problem is and put everything back to normal.

Putting it all together:

Now we need to take the first program, add the new login method with the possible redirect, and wrap it all in some code to accept command line arguments and return values to the system (nagios). This is my version:

usr/bin/python

#This is an example plugin for the popular network monitoring program nagios.

#Check if all the hosts in a pool are live.
#If we log in to a slave by mistake (the master can sometimes change)
#then redirect the request to the real master

#example command line: ./check_pool.py -H ivory -p password -l root

#So: return codes
# 0 : everything is ok
# 1 : named host is slave, but all hosts in pool are up
# 2 : some of the hosts in the pool are down
# 3 : unexpected error

#entire program wrapped in try/except so that we can send exit code 3 to nagios on any error
try:

    import XenAPI
    import sys

    from optparse import OptionParser

    #Parse command line options
    #Python's standard option parser won't do what I want, so I'm subclassing it.
    #firstly, nagios wants exit code three if the options are bad
    #secondly, we want 'required options', which the option parser thinks is an oxymoron.
    #I on the other hand don't want to give defaults for the host and password, because nagios is difficult to set up correctly,
    #and the effect of that may be to hide a problem.
    class MyOptionParser(OptionParser):
        def error(self,msg):
            print msg
            sys.exit(3)
        #stolen from python library reference, add required option check
        def check_required(self, opt):
            option=self.get_option(opt)
            if getattr(self.values, option.dest) is None:
                self.error("%s option not supplied" % option)

    parser = MyOptionParser(description="Nagios plugin to check whether all hosts in a pool are live")

    parser.add_option("-H", "--hostname", dest="hostname", help="name of pool master")
    parser.add_option("-l", "--login-name", default="root", dest="username", help="name to log in as (usually root)")
    parser.add_option("-p", "--password", dest="password", help="password")

    (options, args) = parser.parse_args()

    #abort if host and password weren't specified explicitly on the command line
    parser.check_required("-H")
    parser.check_required("-p")


    #get a session. set host_is_slave true if we need to redirect to a new master
    host_is_slave=False
    try:
        session=XenAPI.Session('https://'+options.hostname)
        session.login_with_password(options.username, options.password)
    except XenAPI.Failure, e:
        if e.details[0]=='HOST_IS_SLAVE':
            session=XenAPI.Session('https://'+e.details[1])
            session.login_with_password(options.username, options.password)
            host_is_slave=True
        else:
            raise
    sx=session.xenapi

    #work out which hosts in the pool are alive, and which dead
    hosts=sx.host.get_all()
    hosts_with_status=[(sx.host.get_name_label(x),sx.host_metrics.get_live( sx.host.get_metrics(x) )) for x in hosts]

    live_hosts=[name for (name,status) in hosts_with_status if (status==True)]
    dead_hosts=[name for (name,status) in hosts_with_status if not (status==True)]


    #log out
    session.logout()

    #nagios wants a single line of output
    print "live hosts", live_hosts, "dead hosts", dead_hosts,
    if host_is_slave:
        print "(%s is not the master)" % options.hostname,
    print

    #and an exit code
    if (len(dead_hosts)<>0):
        exitcode=2
    elif host_is_slave:
        exitcode=1
    else:
        exitcode=0

except Exception, e:
    print "Unexpected Exception [", e.__repr__(), "]"
    sys.exit(3) #Nagios wants error 3 if anything weird happens

sys.exit(exitcode)

An example unix command line would be:

$ ./nagios_plugin.py -H ivory -p password ; echo $?

which will print the return code as well.

Undocumented Option Structures in the API

A lot of the time options structures are provided to provide future proofing and in fact there are no exposed options available to a user, and even when they are the defaults are usually the optimal set.

To work round the documentation limitations you can resort to reading the underlying source code:
https://github.com/xen-org/xen-api/tree/master/ocaml/xapi

For example in https://github.com/xen-org/xen-api/blob/master/ocaml/xapi/xapi_host.ml
you can identify migrate_receive doesn't use the options dictionary

In https://github.com/xen-org/xen-api/blob/master/ocaml/xapi/xapi_vm_migrate.ml you can with some effort work out that the only option available is an option "force" which can be true or false. These are passed as dictionaries so it would be force="true"

If you can work out the xe equivalent the tab completion available will let you know what options can be set:


root@dt56 ~# xe vm-migrate // double tab here to see....
destination-sr-uuid= live= remote-username=
force= remote-master= vdi:
host= remote-network= vif:
host-uuid= remote-password= vm=
root@dt56 ~# xe vm-migrate force= // leave it empty and tells you the options
Error: Failed to parse parameter 'force': expecting 'true' or 'false'

 

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Technical support for XenServer is available from Citrix.