The Book of Xen - Part 12
Library

Part 12

Once the aoetools package is installed, you can test the exported AoE device on the client by doing: #aoe-discover #aoe-stat e0.01.073GBeth0up #mount/dev/etherd/e0.0/mnt/aoe In this case, the device is 1GB (or thereabouts) in size, has been exported as slot 0 of shelf 0, and has been found on the client's eth0. If it mounts successfully, you're ready to go. You can unmount /mnt/aoe /mnt/aoe and use and use /dev/etherd/e0.0 /dev/etherd/e0.0 as an ordinary as an ordinary phy: phy: device for domU storage. An appropriate domU config device for domU storage. An appropriate domU config disk= disk= line might be: line might be: disk=[phy:/dev/etherd/e0.0,xvda,w]

If you run into any problems, check /var/log/xen/xend.log /var/log/xen/xend.log. The most common problems relate to the machine's inability to find devices-block devices or network devices. In that case, errors will show up in the log file. Make sure that the correct virtual disks and interfaces are configured.

iSCSI AoE and iSCSI share a lot of similarities from the administrator's perspective; they're both ways of exporting storage over a network without requiring special hardware. They both export block devices, rather than filesystems, meaning that only one machine can access an exported device at a time. ISCSI differs from AoE in that it's a routable protocol, based on TCP/IP. This makes it less efficient in both CPU and bandwidth, but more versatile, since iSCSI exports can traverse layer 2 networks.

iSCSI divides the world into targets targets and and initiators initiators. You might be more familiar with these as servers servers and and clients clients, respectively. The servers function as targets for SCSI commands, which are initiated by the client machines. In most installations, the iSCSI targets will be dedicated devices, but if you need to set up an iSCSI server for testing on a general-purpose server, here's how.

Setting Up the iSCSI Server For the target we recommend the iSCSI Enterprise Target iSCSI Enterprise Target implementation ( implementation (http://sourceforge.net/projects/iscsitarget/). Other software exists, but we're less familiar with it.

Your distro vendor most likely provides a package. On Debian it's iscsitarget. Red Hat and friends use the related tgt package, which has somewhat different configuration. Although we don't cover the details of setting up tgt, there is an informative page at http://www.cyberciti.biz/tips/howto-setup-linux-iscsi-target-sanwith-tgt.html. For the rest of this section, we'll a.s.sume that you're using the iSCSI Enterprise Target.

If necessary, you can download and build the iSCSI target software manually. Download the target software from the website and save it somewhere appropriate (we dropped it onto our GNOME desktop for this example). Unpack it: #tarxzvfDesktop/iscsitarget-0.4.16.tar.gz #cdiscsitarget-0.4.16 Most likely you'll be able to build all of the components-both the kernel module and users.p.a.ce tools-via the usual make make process. Ensure that you've installed the openSSL headers, probably as part of the openssl-devel package or similar: process. Ensure that you've installed the openSSL headers, probably as part of the openssl-devel package or similar: #make #makeinstall make install will also copy the default config files into will also copy the default config files into /etc /etc. Our next step is to edit them appropriately.

The main config file is /etc/ietd.conf /etc/ietd.conf. It's liberally commented, and most of the values can safely be left at their defaults (for now). The bit that we're mostly concerned with is the Target section: Targetiqn.2001-04.com.prgmr:domU.orlando Lun0Path=/opt/xen/orlando.img,Type=fileio There are many other variables that we could tweak here, but the basic target definition is simple: the word Target Target followed by a conforming followed by a conforming iSCSI Qualified Name iSCSI Qualified Name with a logical unit definition. Note the with a logical unit definition. Note the Type=fileio Type=fileio. In this example we're using plain files, but you'll most likely also want to use this value with whole disk exports and LVM volumes too.

The init script etc/iscsi_target etc/iscsi_target should have also been copied to the appropriate place. If you want iSCSI to be enabled on boot, create appropriate start and kill links as well. should have also been copied to the appropriate place. If you want iSCSI to be enabled on boot, create appropriate start and kill links as well.

Now we can export our iSCSI devices: #/etc/init.d/iscsi_targetstart To check that it's working: #cat/proc/net/iet/volume tid:1name:iqn.2001-04.com.prgmr:domU.orlando lun:0state:0iotype:fileioiomode:wtpath:/opt/xen/orlando You should see the export(s) that you've defined, along with some status information.

iSCSI Client Setup For the initiator, a variety of clients exist. However, the best-supported package seems to be Open-iSCSI, available at http://www.open-iscsi.org/. Both Red Hat and Debian make a version available through their package manager, as iscsi-initiator-utils and open-iscsi, respectively. You can also download the package from the website and work through the very easy installation process.

When you have the iSCSI initiator installed, however you choose to do it, the next step is to say the appropriate incantations to instruct the machine to mount your iSCSI devices at boot.

The iSCSI daemon, iscsid iscsid, uses a database to specify its devices. You can interact with this database with the iscsiadm iscsiadm command. command. iscsiadm iscsiadm also allows you to perform target discovery and login (here we've used the long option forms for clarity): also allows you to perform target discovery and login (here we've used the long option forms for clarity): #iscsiadm--modediscovery--typesendtargets--portal192.168.1.123 192.168.1.123:3260,1iqn.2001-04.com.prgmr:domU.orlando Note that portal portal, in iSCSI jargon, refers to the IP address via which the resource can be accessed. In this case it's the exporting host. iscsiadm iscsiadm tells us that there's one device being exported, tells us that there's one device being exported, iqn.2001-04.com.prgmr:domU.odin iqn.2001-04.com.prgmr:domU.odin. Now that we know about the node, we can update the iSCSI database: #iscsiadm-mnode-Tiqn.2001-04.com.prgmr:domU.orlando -p192.168.1.123:3260-oupdate-nnode.conn[0].startup-vautomatic Here we use iscsiadm iscsiadm to update a node in the iSCSI database. We specify a target, a portal, and the operation we want to perform on the database node: to update a node in the iSCSI database. We specify a target, a portal, and the operation we want to perform on the database node: update update. We specify a node to update with the -n -n option and a new value with the option and a new value with the -v -v option. Other operations we can perform via the option. Other operations we can perform via the -o -o option are option are new new, delete delete, and show show. See the Open-iSCSI doc.u.mentation for more details.

Restart iscsid iscsid to propagate your changes. (This step may vary depending on your distro. Under Debian the script is to propagate your changes. (This step may vary depending on your distro. Under Debian the script is open-iscsi open-iscsi; under Red Hat it's iscsid iscsid.) #/etc/init.d/open-iscsirestart Note the new device in dmesg dmesg: iscsi:registeredtransport(iser) scsi3:iSCSIInitiatoroverTCP/IP Vendor:IETModel:VIRTUAL-DISKRev:0 Type:Direct-AccessANSISCSIrevision:04 SCSIdevicesda:8192000512-bytehdwrsectors(4194MB) sda:WriteProtectisoff sda:ModeSense:77000008 SCSIdevicesda:drivecache:writethrough SCSIdevicesda:8192000512-bytehdwrsectors(4194MB) Note that this is the first SCSI device on the dom0, and thus becomes /dev/sda /dev/sda. Further iSCSI exports become sdb sdb, and so on. Of course, using local SCSI device nodes for network storage presents obvious management problems. We suggest mitigating this by using the devices under /dev/disk/by-path /dev/disk/by-path. Here /dev/sda /dev/sda becomes becomes /dev/disk/by-path/ip-192.168.1.123:3260-iscsi-larry:domU.orlando /dev/disk/by-path/ip-192.168.1.123:3260-iscsi-larry:domU.orlando. Your device names, of course, will depend on the specifics of your setup.

Now that you're equipped with the device, you can install a Xen instance on it, most likely with a disk= disk= line similar to the following: line similar to the following: disk=['phy:/dev/disk/by-path/ip-192.168.1.123:3260-iscsi-larry:domU.orlando,xvda,rw']

Since the domain is backed by shared iSCSI storage, you can then migrate the domain to any connected Xen dom0.

[55] A natural extension would be to have the domU mount the network storage directly by including the driver and support software in the initrd. In that case, no local disk configuration would be necessary. A natural extension would be to have the domU mount the network storage directly by including the driver and support software in the initrd. In that case, no local disk configuration would be necessary.

Quo Peregrinatur Grex So that's migration. In this chapter we've described: How to manually move a domain from one host to another Cold migration of a domain between hosts Live migration between hosts on the same subnet Shared storage for live migration Apply these suggestions, and find your manageability significantly improved!

Chapter10.PROFILING AND BENCHMARKING UNDER XEN Disraeli was pretty close: actually, there are Lies, d.a.m.n lies, Statistics, Benchmarks, and Delivery dates.-Anonymous, attributed to Usenet

We've made a great fuss over how Xen, as a virtualization technology, offers better performance than competing technologies. However, when it comes to proofs and signs, we have been waving our hands and citing authorities. We apologize! In this chapter we will discuss how to measure Xen's performance for yourself, using a variety of tools.

We'll look closely at three general cla.s.ses of performance monitoring, each of which you might use for a different reason. First, we have benchmarking Xen domU performance. If you are running a hosting service (or buying service from a hosting service), you need to see how the Xen image you are providing (or renting) stacks up to the compet.i.tion. In this category, we have general-purpose synthetic benchmarks synthetic benchmarks.

Second, we want to be able to benchmark Xen versus other virtualization solutions (or bare hardware) for your workload for your workload because Xen has both strengths and weaknesses compared to other virtualization packages. These because Xen has both strengths and weaknesses compared to other virtualization packages. These application benchmarks application benchmarks will help to determine whether Xen is the best match for your application. will help to determine whether Xen is the best match for your application.

Third, sometimes you have a performance problem in your Xen-related or kernel-related program, and you want to pinpoint the bits of code that are moving slowly. This category includes profiling tools profiling tools, such as OProfile. (Xen developers may also ask you for OProfile output when you ask about performance issues on the xen-devel xen-devel list.) list.) Although some of these techniques might come in handy while troubleshooting, we haven't really aimed our discussion here at solving problems-rather, we try to present an overview of the tools for various forms of speed measurement. See Chapter15 Chapter15 for more specific troubleshooting suggestions. for more specific troubleshooting suggestions.

A Benchmarking Overview We've seen that the performance of a paravirtualized Xen domain running most workloads approximates that of the native machine. However, there are cases where this isn't true or where this fuzzy simulacrum of the truth isn't precise enough. In these cases, we move from prescientific a.s.sertion to direct experimentation-that is, using benchmarking tools and simulators to find actual, rather than theoretical, performance numbers.

As we're sure you know, generalized benchmarking is, if not a "hard problem,"[56] at least quite difficult. If your load is I/O bound, testing the CPU will tell you nothing you need to know. If your load is IPC-bound or blocking on certain threads, testing the disk and the CPU will tell you little. Ultimately, the best results come from benchmarks that use as close to real-world load as possible. at least quite difficult. If your load is I/O bound, testing the CPU will tell you nothing you need to know. If your load is IPC-bound or blocking on certain threads, testing the disk and the CPU will tell you little. Ultimately, the best results come from benchmarks that use as close to real-world load as possible.

The very best way to test, for example, the performance of a server that serves an HTTP web application would be to sniff live traffic hitting your current HTTP server, and then replay that data against the new server, speeding up or slowing down the replay to see if you have more or less capacity than before.

This, of course, is rather difficult both to do and to generalize. Most people go at least one step into "easier" and "more general." In the previous example, you might pick a particularly heavy page (or a random sampling of pages) and test the server with a generalized HTTP tester, such as Siege. This usually still gives you pretty good results, is a lot easier, and has fewer privacy concerns than running the aforementioned live data.

There are times, however, when a general benchmark, for all its inadequacies, is the best tool. For example, if you are trying to compare two virtual private server providers, a standard, generalized test might be more readily available than a real-world, specific test. Let's start by examining a few of the synthetic benchmarks that we've used.

UnixBench One cla.s.sic benchmarking tool is the public domain UnixBench released by BYTE BYTE magazine in 1990, available from magazine in 1990, available from http://www.tux.org/pub/tux/niemi/unixbench/. The tool was last updated in 1999, so it is rather old. However, it seems to be quite popular for benchmarking VPS providers-by comparing one provider's UnixBench number to another, you can get a rough idea of the capacity of VM they're providing.

UnixBench is easy to install-download the source, untar it, build it, and run it.

#tarzxvfunixbench-4.1.0.tgz #cdunixbench-4.1.0 #make #./Run (That last command is a literal "Run"-it's a script that cycles through the various tests, in order, and outputs results.) You may get some warnings, or even errors, about the -fforce-mem -fforce-mem option that UnixBench uses, depending on your compiler version. If you edit the Makefile to remove all instances of option that UnixBench uses, depending on your compiler version. If you edit the Makefile to remove all instances of -fforce-mem -fforce-mem, UnixBench should build successfully.

We recommend benchmarking the Xen instance in single-user mode if possible. Here's some example output: INDEXVALUES TESTBASELINERESULTINDEX

Dhrystone2usingregistervariables116700.01988287.6170.4 Double-PrecisionWhetstone55.0641.4116.6 ExeclThroughput43.01619.6376.7 FileCopy1024bufsize2000maxblocks3960.0169784.0428.7 FileCopy256bufsize500maxblocks1655.053117.0320.9 FileCopy4096bufsize8000maxblocks5800.0397207.0684.8 PipeThroughput12440.0233517.3187.7 Pipe-basedContextSwitching4000.075988.8190.0 ProcessCreation126.06241.4495.3 Sh.e.l.lScripts(8concurrent)6.0173.6289.3 SystemCallOverhead15000.0184753.6123.2 ========= FINALSCORE...............................264.5 Armed with a UnixBench number, you at least have some basis for comparison between different VPS providers. It's not going to tell you much about the specific performance you're going to get, but it has the advantage that it is a widely published, readily available benchmark.

Other tools, such as netperf and Bonnie++, can give you more detailed performance information.

a.n.a.lyzing Network Performance One popular tool for measuring low-level network performance is netperf. This tool supports a variety of performance measurements, with a focus on measuring the efficiency of the network implementation. It's also been used in Xen-related papers. For one example, see "The Price of Safety: Evaluating IOMMU Performance" by Muli Ben-Yehuda et al.[57]

First, download netperf from http://netperf.org/netperf/DownloadNetperf.html. We picked up version 2.4.4.

#wgetftp://ftp.netperf.org/netperf/netperf-2.4.4.tar.bz2 Untar it and enter the netperf directory.

#tarxjvfnetperf-2.4.4.tar.bz2 #cdnetperf-2.4.

Configure, build, and install netperf. (Note that these directions are a bit at variance with the doc.u.mentation; the doc.u.mentation claims that /opt/netperf /opt/netperf is the hard-coded install prefix, whereas it seems to install in is the hard-coded install prefix, whereas it seems to install in /usr/local /usr/local for me. Also, the manual seems to predate netperf's use of Autoconf.) for me. Also, the manual seems to predate netperf's use of Autoconf.) #./configure #make #su #makeinstall netperf works by running the client, netperf netperf, on the machine being benchmarked. netperf netperf connects to a connects to a netserver netserver daemon and tests the rate at which it can send and receive data. So, to use daemon and tests the rate at which it can send and receive data. So, to use netperf netperf, we first need to set up netserver netserver.

In the standard service configuration, netserver netserver would run under would run under inetd inetd; however, inetd inetd is obsolete. Many distros don't even include it by default. Besides, you probably don't want to leave the benchmark server running all the time. Instead of configuring is obsolete. Many distros don't even include it by default. Besides, you probably don't want to leave the benchmark server running all the time. Instead of configuring inetd inetd, therefore, run netserver netserver in standalone mode: in standalone mode: #/usr/local/bin/netserver Startingnetserveratport12865 Startingnetserverathostname0.0.0.0port12865andfamilyAF_UNSPEC Now we can run the netperf netperf client with no arguments to perform a 10-second test with the local daemon. client with no arguments to perform a 10-second test with the local daemon.

#netperf TCPSTREAMTESTfrom0.0.0.0(0.0.0.0)port0AF_INETtolocalhost(127.0.0.1) port0AF_INET RecvSendSend SocketSocketMessageElapsed SizeSizeSizeTimeThroughput bytesbytesbytessecs.10^6bits/sec 87380163841638410.0110516.33 Okay, looks good. Now we'll test from the dom0 to this domU. To do that, we install the netperf binaries as described previously and run netperf netperf with the with the -H -H option to specify a target host (in this case, .74 is the domU we're testing against): option to specify a target host (in this case, .74 is the domU we're testing against): #netperf-H216.218.223.74,ipv4 TCPSTREAMTESTfrom0.0.0.0(0.0.0.0)port0AF_INETto192.0.2.74 (192.0.2.74)port0AF_INET RecvSendSend SocketSocketMessageElapsed SizeSizeSizeTimeThroughput bytesbytesbytessecs.10^6bits/sec 87380163841638410.00638.59 Cool. Not as fast, obviously, but we expected that. Now from another physical machine to our test domU: #netperf-H192.0.2.66 TCPSTREAMTESTfrom0.0.0.0(0.0.0.0)port0AF_INETto192.0.2.66 (192.0.2.66)port0AF_INET RecvSendSend SocketSocketMessageElapsed SizeSizeSizeTimeThroughput bytesbytesbytessecs.10^6bits/sec 87380163841638410.2587.72 Ouch. Well, so how much of that is Xen, and how much is the network we're going through? To find out, we'll run the netserver netserver daemon on the dom0 hosting the test domU and connect to that: daemon on the dom0 hosting the test domU and connect to that: #netperf-H192.0.2.74 TCPSTREAMTESTfrom0.0.0.0(0.0.0.0)port0AF_INETto192.0.2.74 (192.0.2.74)port0AF_INET RecvSendSend SocketSocketMessageElapsed SizeSizeSizeTimeThroughput bytesbytesbytessecs.10^6bits/sec 87380163841638410.1293.66 It could be worse, I guess. The moral of the story? xennet xennet introduces a noticeable but reasonable overhead. Also, netperf can be a useful tool for discovering the actual bandwidth you've got available. In this case the machines are connected via a 100Mbit connection, and netperf lists an actual throughput of 93.66Mbits/second. introduces a noticeable but reasonable overhead. Also, netperf can be a useful tool for discovering the actual bandwidth you've got available. In this case the machines are connected via a 100Mbit connection, and netperf lists an actual throughput of 93.66Mbits/second.

Measuring Disk Performance with Bonnie++ One of the major factors in a machine's overall performance is its disk subsystem. By exercising its hard drives, we can get a useful metric to compare Xen providers or Xen instances with, say, VMware guests.

We, like virtually everyone else on the planet, use Bonnie++ to measure disk performance. Bonnie++ attempts to measure both random and sequential disk performance and does a good job simulating real-world loads. This is especially important in the Xen context because of the degree to which domains are part.i.tioned-although domains share resources, there's no way for them to coordinate resource use.

One ill.u.s.tration of this point is that if multiple domains are trying to access a platter simultaneously, what looks like sequential access from the viewpoint of one VM becomes random accesses to the disk. This makes things like seek time and the robustness of your tagged queuing system much more important. To test the effect of these optimizations on domU performance, you'll probably want a tool like Bonnie++.

The Bonnie++ author maintains a home page at http://www.c.o.ker.com.au/bonnie++/. Download the source package, build it, and install it: #wgethttp://www.c.o.ker.com.au/bonnie++/bonnie++-1.03c.tgz #cdbonnie++-1.03c #make #makeinstall At this point you can simply invoke Bonnie++ with a command such as: #/usr/local/sbin/bonnie++ This command will run some tests, printing status information as it goes along, and eventually generate output like this: Version1.03------SequentialOutput--------SequentialInput---Random- -PerChr---Block---Rewrite--PerChr---Block----Seeks-- MachineSizeK/sec%CPK/sec%CPK/sec%CPK/sec%CPK/sec%CP/sec%CP alastor2512M207367655093142111252638587556586194.90 ...........------SequentialCreate--------------RandomCreate-------- -Create----Read----Delete---Create----Read----Delete-- files/sec%CP/sec%CP/sec%CP/sec%CP/sec%CP/sec%CP 2563599089227885851687728341468433422799571610 Note that some tests may simply output a row of pluses. This indicates that the machine finished them in less than 500 ms. Make the workload more difficult. For example, you might specify something like: #/usr/local/sbin/bonnie++-d.-s2512-n256 This specifies writing 2512MB files for I/O performance tests. (This is the default file size, which is twice the RAM size on this particular machine. This is important to ensure that we're not just exercising RAM rather than disk.) It also tells Bonnie++ to create 256*1024 files in its file creation tests.

We also recommend reading Bonnie++'s online manual, which includes a fair amount of pithy benchmarking wisdom, detailing why the author chose to include the tests that he did, and what meanings the different numbers have.

[56] The phrase "hard problem" is usually used as dry and bleak humor. Cla.s.sic "hard problems" include natural language and strong AI. See also: "interesting." The phrase "hard problem" is usually used as dry and bleak humor. Cla.s.sic "hard problems" include natural language and strong AI. See also: "interesting."

[57] See See http://ols.108.redhat.com/2007/Reprints/ben-yehuda-Reprint.pdf.

Application Benchmarks Of course, the purpose of a server is to run applications-we're not really interested in how many times per second the VM can do absolutely nothing. For testing application performance, we use the applications that we're planning to put on the machine, and then throw load at them.

Since this is necessarily application-specific, we can't give you too many pointers on specifics. There are good test suites available for many popular libraries. For example, we've had customers benchmark their Xen instances with the popular web framework Django.[58]

httperf: A Load Generator for HTTP Servers Having tested the effectiveness of your domain's network interface, you may want to discover how well the domain performs when serving applications through that interface. Because of Xen's server-oriented heritage, one popular means of testing its performance in HTTP-based real-world applications is httperf httperf. The tool generates HTTP requests and summarizes performance statistics. It supports HTTP/1.1 and SSL protocols and offers a variety of workload generators. You may find httperf httperf useful if, for example, you're trying to figure out how many users your web server can handle before it goes casters-up. useful if, for example, you're trying to figure out how many users your web server can handle before it goes casters-up.

First, install httperf httperf on a machine other than the one you're testing-it can be another domU, but we usually prefer to install it on something completely separate. This "load" machine should also be as close to the target machine as possible-preferably connected to the same Ethernet switch. on a machine other than the one you're testing-it can be another domU, but we usually prefer to install it on something completely separate. This "load" machine should also be as close to the target machine as possible-preferably connected to the same Ethernet switch.

You can get httperf httperf through your distro's package-management mechanism or from through your distro's package-management mechanism or from http://www.hpl.hp.com/research/linux/httperf/.

If you've downloaded the source code, build it using the standard method. httperf httperf 's doc.u.mentation recommends using a separate build directory rather than building directly in the source tree. Thus, from the 's doc.u.mentation recommends using a separate build directory rather than building directly in the source tree. Thus, from the httperf httperf source directory: source directory: #mkdirbuild #cdbuild #../configure #make #makeinstall Next, run appropriate tests. What we usually do is run httperf httperf with a command similar to this: with a command similar to this: #httperf--server192.168.1.80--uri/index.html--num-conns6000 --rate1500 In this case we're just demanding a static HTML page, so the request rate is obscenely high; usually we would use a much smaller number in tests of real-world database-backed websites.

httperf will then give you some statistics. The important numbers, in our experience, are the connection rate, the request rate, and the reply rate. All of these should be close to the rate specified on the command line. If they start to decline from that number, that indicates that the server has reached its capacity. will then give you some statistics. The important numbers, in our experience, are the connection rate, the request rate, and the reply rate. All of these should be close to the rate specified on the command line. If they start to decline from that number, that indicates that the server has reached its capacity.

However, httperf httperf isn't limited to repeated requests for a single file. We prefer to use isn't limited to repeated requests for a single file. We prefer to use httperf httperf in session mode by specifying the in session mode by specifying the --wsesslog --wsesslog workload generator. This gives a closer approximation to the actual load on the web server. You can create a session file from your web server logs with a bit of Perl, winding up with a simple formatted list of URLs: workload generator. This gives a closer approximation to the actual load on the web server. You can create a session file from your web server logs with a bit of Perl, winding up with a simple formatted list of URLs: /newsv3/ ....../style/crimson.css ....../style/ash.css ....../style/azure.css ....../images/news.feeds.anime/sites/ann-xs.gif ....../images/news.feeds.anime/sites/annpr-xs.gif ....../images/news.feeds.anime/sites/aod-xs.gif ....../images/news.feeds.anime/sites/an-xs.gif ....../images/news.feeds.anime/header-lite.gif /index.shtml ....../style/sable.css ....../images/banners/igloo.gif ....../images/temp_banner.gif ....../images/faye_header2.jpg ....../images/faye-birthday.jpg ....../images/giant_arrow.gif ....../images/faye_header.jpg /news/ /events/ ....../events/events.css ....../events/summergathering2007/coverimage.jpg

(andsoforth.) This session file lists files for httperf httperf to request, with indentations to define bursts; a group of lines that begin with whites.p.a.ce is a burst. When run, to request, with indentations to define bursts; a group of lines that begin with whites.p.a.ce is a burst. When run, httperf httperf will request the first burst, wait a certain amount of time, then move to the next burst. Equipped with this session file, we can use will request the first burst, wait a certain amount of time, then move to the next burst. Equipped with this session file, we can use httperf httperf to simulate a user: to simulate a user: #httperf--hog--server192.168.1.80--wsesslog=40,10,urls.txt--rate=1 This will start 40 sessions at the rate of one per second. The new parameter, --wsesslog --wsesslog, takes the input of urls.txt urls.txt and runs through it in bursts, pausing 10 seconds between bursts to simulate the user thinking. and runs through it in bursts, pausing 10 seconds between bursts to simulate the user thinking.

Again, throw this at your server, increasing the rate until the server can't meet demand. When the server fails, congratulations! You've got a benchmark.

Another Application Benchmark: POV-Ray Of course, depending on your application, httperf httperf may not be a suitable workload. Let's say that you've decided to use Xen to render scenes with popular open source raytracer POV-Ray. (If nothing else, it's a good way to soak up spare CPU cycles.) may not be a suitable workload. Let's say that you've decided to use Xen to render scenes with popular open source raytracer POV-Ray. (If nothing else, it's a good way to soak up spare CPU cycles.) The POV-Ray benchmark is easy to run. Just give the -benchmark -benchmark option on the command line: option on the command line: #povray-benchmark This renders a standard scene and gives a large number of statistics, ending with an overall summary and rendering time. A domU with a 2.8 GHz Pentium 4 and 256MB of memory gave us the following output: SmallestAlloc:9bytes LargestAlloc:1440008bytes Peakmemoryused:5516100bytes TotalSceneProcessingTimes Pa.r.s.eTime:0hours0minutes2seconds(2seconds) PhotonTime:0hours0minutes53seconds(53seconds) RenderTime:0hours43minutes26seconds(2606seconds) TotalTime:0hours44minutes21seconds(2661seconds) Now you've got a single number that you can easily compare between various setups running POV-Ray, be they Xen instances, VMware boxes, or physical servers.

Tuning Xen for Optimum Benchmarking Most system administration work involves comparing results at the machine level-a.n.a.lyzing the performance of a Xen VM relative to another machine, virtual or not. However, with virtualization, there are some performance k.n.o.bs that aren't obvious but can make a huge difference in the final benchmark results.

First, Xen allocates CPU dynamically and attempts to keep the CPU busy as much as possible. That is, if dom2 isn't using all of its allocated CPU, dom3 can pick up the extra. Although this is usually a good thing, it can make CPU benchmark data misleading. While testing, you can avoid this problem by specifying the cap cap parameter to the scheduler. For example, to ensure that domain ID 1 can get no more than 50 percent of one CPU: parameter to the scheduler. For example, to ensure that domain ID 1 can get no more than 50 percent of one CPU: #xmsched-credit-d1-c50 Second, guests in HVM mode absolutely must use paravirtualized drivers for acceptable performance. This point is driven home in a XenSource a.n.a.lysis of benchmark results published by VMware, in which XenSource points out that, in VMware's benchmarks, "XenSource's Xen Tools for Windows, which optimize the I/O path, were not installed. The VMware benchmarks should thus be disregarded in their entirety."

Also, shared resources (like disk I/O) are difficult to account, can interact with dom0 CPU demand, and can be affected by other domUs. For example, although paravirtualized Xen can deliver excellent network performance, it requires more CPU cycles to do so than a nonvirtualized machine. This may affect the capacity of your machine.

This is a difficult issue to address, and we can't really offer a magic bullet. One point to note is that the dom0 will likely use more CPU than an intuitive estimate would suggest; it's very important to weight the dom0's CPU allocation heavily, or perhaps even devote a core exclusively to the dom0 on boxes with four or more cores.

For benchmarking, we also recommend minimizing error by benchmarking with a reasonably loaded machine. If you're expecting to run a dozen domUs, then they should all be performing some reasonable synthetic task while benchmarking to get an appreciation for the real-world performance of the VM.

[58] http://journal.uggedal.com/vps-comparison-between-slicehost-and-prgmr uses Django among other tools. uses Django among other tools.

Profiling with Xen Of course, there is one way of seeing shared resource use more precisely. We can profile profile the VM as it runs our application workload to get a clear idea of what it's doing and-with a Xen-aware profiler-how other domains are interfering with us. the VM as it runs our application workload to get a clear idea of what it's doing and-with a Xen-aware profiler-how other domains are interfering with us.

Profiling refers to the practice of examining a specific application to see what it spends time doing. In particular, it can tell you whether an app is CPU or I/O limited, whether particular functions are inefficient, or whether performance problems are occurring outside of the app entirely, perhaps in the kernel.

Here, we'll discuss a sample setup with Xen and OProfile, using the kernel compile as a standard workload (and one that most Xen admins are likely to be familiar with).

Xenoprof OProfile is probably the most popular profiling package for Linux.[59] The kernel includes OProfile support, and the user-s.p.a.ce tools come with virtually every distro we know. If you have a performance problem with a particular program and want to see precisely what's causing it, OProfile is the tool for the job. The kernel includes OProfile support, and the user-s.p.a.ce tools come with virtually every distro we know. If you have a performance problem with a particular program and want to see precisely what's causing it, OProfile is the tool for the job.

OProfile works by incrementing a counter whenever the program being profiled performs a particular action. For example, it can keep count of the number of cache misses or the number of instructions executed. When the counter reaches a certain value, it instructs the OProfile daemon to sample the counter, using a non-maskable interrupt to ensure prompt handling of the sampling request.

Xenoprofile, or Xenoprof, is a version of OProfile that has been extended to work as a system-wide profiling tool under Xen, using hypercalls to enable domains to access hardware performance counters. It supports a.n.a.lysis of complete Xen instances and accounts for time spent in the hypervisor or within another domU.

Getting OProfile As of recent versions, Xen includes support for OProfile versions up to 0.9.2 (0.9.3 will require you to apply a patch to the Xen kernel). For now, it would probably be best to use the packaged version to minimize the tedious effort of recompilation.

If you're using a recent version of Debian, Ubuntu, CentOS, or Red Hat, you're in luck; the version of OProfile that they ship is already set up to work with Xen. Other distro kernels, if they ship with Xen, will likely also incorporate OProfile's Xen support.

Building OProfile If you're not so lucky as to have Xen profiling support already, you'll have to download and build OProfile, for which we'll give very brief directions just for completeness.

The first thing to do is to download the OProfile source from http://oprofile.sourceforge.net/. We used version 0.9.4.

First, untar OProfile, like so: #wgethttp://prdownloads.sourceforge.net/oprofile/oprofile-0.9.4.tar.gz #tarxzvfoprofile-0.9.4.tar.gz #cdoprofile-0.9.4 Then configure and build OProfile: #./configure--with-kernel-support #make #makeinstall Finally, do a bit of Linux kernel configuration if your kernel isn't correctly configured already. (You can check by issuing gzip -d -i /proc/config.gz

grep PROFILE gzip -d -i /proc/config.gz

grep PROFILE.) In our case that returns: CONFIG_PROFILING=y CONFIG_OPROFILE=mNote/proc/config.gz is an optional feature that may not exist. If it doesn't, you'll have to find your configuration some other way. On Fedora 8, for example, you can check for profiling support by looking at the kernel config file shipped with the distro is an optional feature that may not exist. If it doesn't, you'll have to find your configuration some other way. On Fedora 8, for example, you can check for profiling support by looking at the kernel config file shipped with the distro:#cat/boot/config-2.6.23.1-42.fc8

grepPROFILE If your kernel isn't set up for profiling, rebuild it with profiling support. Then install and boot from the new kernel (a step that we won't detail at length here).

OProfile Quickstart To make sure OProfile works, you can profile a standard workload in domain 0. (We chose the kernel compile because it's a familiar task to most sysadmins, although we're compiling it out of the Xen source tree.) Begin by telling OProfile to clear its sample buffers: #opcontrol--reset Now configure OProfile.

#opcontrol--setup--vmlinux=/usr/lib/debug/lib/modules/vmlinux --separate=library--event=CPU_CLK_UNHALTED:750000:0x1:1:1 The first three arguments are the command (setup for profiling), kernel image, and an option to create separate output files for libraries used. The final switch, event event, describes the event that we're instructing OProfile to monitor.

The precise event that you'll want to sample varies depending on your processor type (and on what you're trying to measure). For this run, to get an overall approximation of CPU usage, we used CPU_CLK_UNHALTED CPU_CLK_UNHALTED on an Intel Core 2 machine. On a Pentium 4, the equivalent measure would be on an Intel Core 2 machine. On a Pentium 4, the equivalent measure would be GLOBAL_POWER_EVENTS GLOBAL_POWER_EVENTS. The remaining arguments indicate the size of the counter, the unit mask (in this case, 0x1), and that we want both the kernel and users.p.a.ce code.

INSTALLING AN UNCOMPRESSED KERNEL ON RED HATDERIVED DISTROSOne issue that you may run into with OProfile and kdump, as with any tool that digs into the kernel's innards, is that these tools expect to find an uncompressed kernel with debugging symbols for maximum benefit. This is simple to provide if you've built the kernel yourself, but with a distro kernel it can be more difficult.Under Red Hat and others, these kernels (and other software built for debugging) are in special -debuginfo -debuginfo RPM packages. These packages aren't in the standard RPM packages. These packages aren't in the standard yum yum repositories, but you can get them from Red Hat's FTP site. For Red Hat Enterprise Linux 5, for example, that'd be repositories, but you can get them from Red Hat's FTP site. For Red Hat Enterprise Linux 5, for example, that'd be ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/i386/Debuginfo.For the default kernel, you'll want the packages:kernel-debuginfo-common-'uname-r'.'uname-m'.rpm kernel-PAE-debuginfo-'uname-r'.'uname-m'.rpmDownload these and install them using RPM.#rpm-ivh*.rpm To start collecting samples, run: #opcontrol--start Then run the experiment that you want to profile, in this case a kernel compile.

#/usr/bin/time-vmakebzImage Then stop the profiler.

#opcontrol--shutdown Now that we have samples, we can extract meaningful and useful information from the ma.s.s of raw data via the standard postprofiling tools. The main a.n.a.lysis command is opreport opreport. To get a basic overview of the processes that consumed CPU, we could run: #opreport-t2 CPU:Core2,speed2400.08MHz(estimated) CountedCPU_CLK_UNHALTEDevents(Clockcycleswhennothalted)withaunitmask of0x01(Unhaltedbuscycles)count750000 CPU_CLK_UNHALT...

samples

%

------------------ 37081290.0945cc1 CPU_CLK_UNHALT...

samples

%

------------------ 33271389.7255cc1 3785810.2095libc-2.5.so 2410.0650ld-2.5.so 113642.7611genksyms CPU_CLK_UNHALT...

samples

%

------------------ 815971.7969genksyms 317827.9655libc-2.5.so 270.2376ld-2.5.so This tells us which processes accounted for CPU usage during the compile, with a threshold of 2 percent (indicated by the -t 2 -t 2 option.) This isn't terribly interesting, however. We can get more granularity using the option.) This isn't terribly interesting, however. We can get more granularity using the --symbols --symbols option with option with opreport opreport, which gives a best guess as to what functions accounted for the CPU usage. Try it.

You might be interested in other events, such as cache misses. To get a list of possible counters customized for your hardware, issue: #ophelp Profiling Multiple Domains in Concert So far, all this has covered standard use of OProfile, without touching on the Xen-specific features. But one of the most useful features of OProfile, in the Xen context, is the ability to profile entire domains against each other, a.n.a.lyzing how different scheduling parameters, disk allocations, drivers, and code paths interact to affect performance.

When profiling multiple domains, dom0 still coordinates the session. It's not currently possible to simply profile in a domU without dom0's involvement-domUs don't have direct access to the CPU performance counters.