The Book of Xen - Part 3
Library

Part 3

Selecting a Kernel Traditionally, one boots a domU image using a kernel stored in the dom0 filesystem, as in the sample config file in the last section. In this case, it's common to use the same kernel for domUs and the dom0. However, this can lead to trouble-one distro's kernels may be too specialized to work properly with another distro. We recommend either using the proper distro kernel, copying it into the dom0 filesystem so the domain builder can find it, or compiling your own generic kernel.

Another possible choice is to download Xen's binary distribution, which includes precompiled domU kernels, and extracting an appropriate domU kernel from that.

Alternatively (and this is the option that we usually use when dealing with distros that ship Xen-aware kernels), you can bypa.s.s the entire problem of kernel selection and use PyGRUB to boot the distro's own kernel from within the domU filesystem. For more details on PyGRUB, see Chapter7 Chapter7. PyGRUB also makes it more intuitive to match modules to kernels by keeping both the domU kernel and its corresponding modules in the domU.

Quick-and-Dirty Install via tar Let's start by considering the most basic install method possible, just to get an idea of the principles involved. We'll generate a root filesystem by copying files out of the dom0 (or an entirely separate physical machine) and into the domU. This approach copies out a filesystem known to work, requires no special tools, and is easy to debug. However, it's also likely to pollute the domU with a lot of unnecessary stuff from the source system and is kind of a lot of work.

A good set of commands for this "cowboy" approach might be: #xmblock-attach0duncan.img/dev/xvda1w0 #mke2fs-j/dev/xvda1 #mount/dev/xvda1/mnt #cd/ #tar-c-f---exclude/home--exclude/mnt--exclude/tmp--exclude /proc--exclude/sys--exclude/var

(cd/mnt/;tarxf-) #mkdir/mnt/sys #mkdir/mnt/procNoteDo all this as root.

These commands, in order, map the backing file to a virtual device in the dom0, create a filesystem on that device, mount the filesystem, and tar up the dom0 root directory while omitting /home /home, /mnt /mnt, /tmp /tmp, /proc /proc, /sys /sys, and /var /var. The output from this tar tar command then goes to a complementary command then goes to a complementary tar tar used to extract the file in used to extract the file in /mnt /mnt. Finally, we make some directories that the domU will need after it boots. At the end of this process, we have a self-contained domU in duncan.img duncan.img.

Why This Is Not the Best Idea The biggest problem with the cowboy approach, apart from its basic inelegance, is that it copies a lot of unnecessary stuff with no easy way to clear it out. When the domU is booted, you could use the package manager to remove things or just delete files by hand. But that's work, and we are all about avoiding work.

Stuff to Watch Out For There are some things to note: You must must mkdir /sys mkdir /sys and and /proc /proc or else things will not work properly. or else things will not work properly.

The issue here is that the Linux startup process uses /sys /sys and and /proc /proc to discover and configure hardware-if, say, to discover and configure hardware-if, say, /proc/mounts /proc/mounts doesn't exist, the boot scripts will become extremely annoyed. doesn't exist, the boot scripts will become extremely annoyed.

You may need to mknod /dev/xvda b 220 0 mknod /dev/xvda b 220 0.

/dev/xvd is the standard name for Xen virtual disks, by a.n.a.logy with the is the standard name for Xen virtual disks, by a.n.a.logy with the hd hd and and sd sd device nodes. The first virtual disk is device nodes. The first virtual disk is /dev/xvda /dev/xvda, which can be part.i.tioned into /dev/xvda1 /dev/xvda1, and so on. The command #/mknod/dev/xvdab2200 creates the node /dev/xvda /dev/xvda as a block device (b) with major number 220 (the number reserved for Xen VBDs) and minor number 0 (because it's as a block device (b) with major number 220 (the number reserved for Xen VBDs) and minor number 0 (because it's xvda xvda-the first such device in the system).

Note On most modern Linux systems, udev makes this unnecessary.

You may need to edit /etc/inittab /etc/inittab and and /etc/securettys /etc/securettys so that so that /dev/xvc0 /dev/xvc0 works as the console and has a proper works as the console and has a proper getty getty.

We've noticed this problem only with Red Hat's kernels: for regular XenSource kernels (at least through 3.1) the default getty getty on tty0 should work without further action on your part. If it doesn't, read on! on tty0 should work without further action on your part. If it doesn't, read on!

The term console console is something of a holdover from the days of giant time-sharing machines, when the system operator sat at a dedicated terminal called the is something of a holdover from the days of giant time-sharing machines, when the system operator sat at a dedicated terminal called the system console system console. Nowadays, the console is a device that receives system administration messages-usually a graphics device, sometimes a serial console.

In the Xen case, all output goes to the Xen virtual console, xvc0 xvc0. The xm console xm console command attaches to this device with help from command attaches to this device with help from xenconsoled xenconsoled. To log in to it, Xen's virtual console must be added to /etc/inittab /etc/inittab so that so that init init knows to attach a knows to attach a getty getty.[17] Do this by adding a line like the following: Do this by adding a line like the following: xvc:2345:resp.a.w.n:/sbin/agetty-Lxvc0 (As with all examples in books, don't take this construction too literally! If you have a differently named getty getty binary, for example, you will definitely want to use that instead.) binary, for example, you will definitely want to use that instead.) You might also, depending on your policy regarding root logins, want to add /dev/xvc0 /dev/xvc0 to to /etc/securetty /etc/securetty so that root will be able to log in on it. Simply append a line containing the device name, so that root will be able to log in on it. Simply append a line containing the device name, xvc0 xvc0, to the file.

[17] getty getty gives you a login prompt. What, you didn't think they showed up by magic, did you? gives you a login prompt. What, you didn't think they showed up by magic, did you?

Using the Package Management System with an Alternate Root Another way to obtain a domU image would be to just run the setup program for your distro of choice and instruct it to install to the mounted domU root. The disadvantage here is that most setup programs expect to be installed on a real machine, and they become surly and uncooperative when forced to deal with paravirtualization.

Nonetheless, this is a viable process for most installers, including both RPM and Debian-based distros. We'll describe installation using both Red Hat's and Debian's tools.

Red Hat, CentOS, and Other RPM-Based Distros On Red Hatderived systems, we treat this as a package package installation, rather than a installation, rather than a system installation system installation. Thus, rather than using anaconda anaconda, the system installer, we use yum yum, which has an installation mode suitable for this sort of thing.

First, it's easiest to make sure that SELinux is disabled or nonenforcing because its extended permissions and policies don't work well with the installer.[18] The quickest way to do this is to issue The quickest way to do this is to issue echo 0 >/selinux/enforce echo 0 >/selinux/enforce. A more permanent solution would be to boot with selinux=0 selinux=0 on the kernel command line. on the kernel command line.

NoteSpecify kernel parameters as a s.p.a.ce-separated list on the "module" line that loads the Linux kernel-either in /boot/grub/menu.lst /boot/grub/menu.lst or by pushing or by pushing e e at the GRUB menu at the GRUB menu.

When that's done, mount your target domU image somewhere appropriate. Here we create the logical volume malcom malcom in the volume group in the volume group scotland scotland and mount it on and mount it on /mnt /mnt: #lvcreate-L4096-nmalcomscotland #mount/dev/scotland/malcom/mnt/ Create some vital directories, just as in the tar tar example: example: #cd/mnt #mkdirprocsysetc Make a basic fstab fstab (you can just copy the one from dom0 and edit the root device as appropriate-with the sample config file mentioned earlier, you would use (you can just copy the one from dom0 and edit the root device as appropriate-with the sample config file mentioned earlier, you would use /dev/sda /dev/sda): #cp/etc/fstab/mnt/etc #vi/mnt/etc/fstab Fix modprobe.conf modprobe.conf, so that the kernel knows where to find its device drivers. (This step isn't technically necessary, but it enables yum upgrade yum upgrade to properly build a new initrd when the kernel changes-handy if you're using PyGRUB.) to properly build a new initrd when the kernel changes-handy if you're using PyGRUB.) #echo"alia.s.scsi_hostadapterxenblknaliaseth0xennet">/mnt/etc/modprobe.conf At this point you'll need an RPM that describes the software release version and creates the yum yum configuration files-we installed CentOS 5, so we used configuration files-we installed CentOS 5, so we used centos-release-5.el5.centos.i386.rpm centos-release-5.el5.centos.i386.rpm.

#wgethttp://mirrors.prgmr.com/os/centos/5/os/i386/CentOS/centos-release-5.el5.centos.i386.rpm #rpm-ivh--nodeps--root/mntcentos-release-5.el5.centos.i386.rpm Normally, the CentOS release RPM includes the minor version number, but it is hard to find old versions. See the README.prgmr README.prgmr file in the same directory for a full explanation. file in the same directory for a full explanation.

Next we install yum yum under the new install tree. If we don't do this before installing other packages, under the new install tree. If we don't do this before installing other packages, yum yum will complain about transaction errors: will complain about transaction errors: #yum--installroot=/mnt-yinstallyum Now that the directory has been appropriately populated, we can use yum yum to finish the install. to finish the install.

#yum--installroot=/mnt-ygroupinstallBase And that's really all there is to it. Create a domU config file as normal.

Debootstrap with Debian and Ubuntu Debootstrap is quite a bit easier. Create a target for the install (using LVM or a flat file), mount it, and then use debootstrap to install a base system into that directory. For example, to install Debian Etch on an x68_64 machine: #mount/dev/scotland/banquo/mnt #debootstrap--include=ssh,udev,linux-image-xen-amd64etch/mnthttp://mirrors.easynews.com/ linux/debian Note the --include= --include= option. Because Xen's networking requires the hot-plug system, the domU must include a working install of udev with its support scripts. (We've also included SSH, just for convenience and to demonstrate the syntax for multiple items.) If you are on an i386 platform, add libc6-xen to the include list. Finally, to ensure that we have a compatible kernel and module set, we add a suitable kernel to the option. Because Xen's networking requires the hot-plug system, the domU must include a working install of udev with its support scripts. (We've also included SSH, just for convenience and to demonstrate the syntax for multiple items.) If you are on an i386 platform, add libc6-xen to the include list. Finally, to ensure that we have a compatible kernel and module set, we add a suitable kernel to the include= include= list. We use list. We use linux-image-xen-amd64 linux-image-xen-amd64. Pick an appropriate kernel for your hardware.

If you want to use PyGRUB, create /mnt/etc/modules /mnt/etc/modules before you run before you run debootstrap debootstrap, and put in that file: xennet xenblk Also, create a /mnt/boot/grub/menu.lst /mnt/boot/grub/menu.lst file as for a physical machine. file as for a physical machine.

If you're not planning to use PyGRUB, make sure that an appropriate Debian kernel and ramdisk are accessible from the dom0, or make sure that modules matching your planned kernel are available within the domU. In this case, we'll copy the sdom0 kernel modules into the domU.

#cp-a/lib/modules//mnt/lib/modules When that's done, copy over /etc/fstab /etc/fstab to the new system, editing it if necessary: to the new system, editing it if necessary: #cp/etc/fstab/mnt/etc Renaming Network Devices Debian, like many systems, uses udev to tie eth0 eth0 and and eth1 eth1 to consistent physical devices. It does this by a.s.signing the device name ( to consistent physical devices. It does this by a.s.signing the device name (ethX) based on the MAC address of the Ethernet device. It will do this during debootstrap-this means that it ties eth0 eth0 to the MAC of the box you are running to the MAC of the box you are running debootstrap debootstrap on. In turn, the domU's Ethernet interface, which presumably has a different MAC address, will become on. In turn, the domU's Ethernet interface, which presumably has a different MAC address, will become eth1 eth1.[19] You can avoid this by removing You can avoid this by removing /mnt/etc/udev/rules.d/z25_persistent-net.rules /mnt/etc/udev/rules.d/z25_persistent-net.rules, which contains the stored mappings between MAC addresses and device names. That file will be recreated next time you reboot. If you only have one interface, it might make sense to remove the file that generates it, /mnt/etc/udev/rules.d/z45_persistent-net-generator.rules /mnt/etc/udev/rules.d/z45_persistent-net-generator.rules.#rm/mnt/etc/udev/rules.d/z25_persistent-net.rules Finally, unmount the install root. Your system should then essentially work. You may want to change the hostname and edit /etc/inittab /etc/inittab within the domU's filesystem, but these are purely optional steps. within the domU's filesystem, but these are purely optional steps.#umount/mnt Test the new install by creating a config file as previously described (say, /etc/xen/banquo /etc/xen/banquo) and issuing: #xmcreate-c/etc/xen/banquo [18] Although we don't really approve of the tendency to disable SELinux at the first hint of trouble, we decided to take the path of least resistance. Although we don't really approve of the tendency to disable SELinux at the first hint of trouble, we decided to take the path of least resistance.[19] Or another device, depending on how many Ethernet devices the original machine had. Or another device, depending on how many Ethernet devices the original machine had.QEMU Install Our favorite way to create the domU image-the way that most closely simulates a real machine-is probably to install using QEMU and then take the installed filesystem and use that as your domU root filesystem. This allows you, the installer, to leverage your years of experience installing Linux. Because it's installing in a virtual machine as strongly part.i.tioned as Xen's, the install program is very unlikely to do anything surprising and even more unlikely to interact badly with the existing system. QEMU also works equally well with all distros and even non-Linux operating systems.QEMU does have the disadvantage of being slow. Because KQEMU (the kernel acceleration module) isn't compatible with Xen, you'll have to fall back to software-only full emulation. Of course, you can use this purely for an initial image-creation step and then copy the pristine disk images around as needed, in which case the speed penalty becomes less important.QEMU'S RELATION TO XENYou may already have noted that QEMU gets mentioned fairly often in connection with Xen. There's a good reason for this: The two projects complement each other. Although QEMU is a pure pure, or cla.s.sic cla.s.sic, full emulator, there's some overlap in QEMU's and Xen's requirements. For example, Xen can use QCOW images for its disk emulation, and it uses QEMU fully virtualized drivers when running in hardware virtualization mode. QEMU also furnishes some code for the hardware virtualization built into the Linux kernel, KVM (kernel virtual machine)[20] and win4lin, on the theory that there's no benefit in reinventing the wheel. and win4lin, on the theory that there's no benefit in reinventing the wheel.Xen and QEMU aren't the same, but there's a general consensus that they complement each other well, with Xen more suited to high-performance production environments, and QEMU is aimed more at exact emulation. Xen's and QEMU's developers have begun sharing patches and working together. They're distinct projects, but Xen developers have acknowledged that QEMU "played a critical role in Xen's success."[21]This technique works by running QEMU as a pure emulator for the duration of the install, using emulated devices. Begin by getting and installing QEMU. Then run: #qemu-hda/dev/scotland/macbeth-cdromslackware-11.0-install-dvd.iso-bootd This command runs QEMU with the target device-a logical volume in this case-as its hard drive and the install medium as its virtual CD drive. (The Slackware ISO here, as always, is just an example-install whatever you like.) The -boot d -boot d option tells QEMU to boot from the emulated CD drive. option tells QEMU to boot from the emulated CD drive.Now install to the virtual machine as usual. At the end, you should have a completely functional domU image. Of course, you're still going to have to create an appropriate domU config file and handle the other necessary configuration from the dom0 side, but all of that is reasonably easy to automate.One last caveat that bears repeating because it applies to many of these install methods: If the domU kernel isn't Xen-aware, then you will have to either use a kernel from the dom0 or mount the domU and replace its kernel.[20] Although we don't cover KVM extensively, it's another interesting virtualization technology. More information is available at the KVM web page, Although we don't cover KVM extensively, it's another interesting virtualization technology. More information is available at the KVM web page, http://kvm.sf.net/.[21] Liguori, Anthony, "Merging QEMU-DM upstream," Liguori, Anthony, "Merging QEMU-DM upstream," http://www.xen.org/files/xensummit_4/Liguori_XenSummit_Spring_2007.pdf.virt-install-Red Hat's One-Step DomU Installer Red Hat opted to support a generic virtualization concept concept rather than a specific rather than a specific technology technology. Their approach is to wrap the virtualization in an abstraction layer, libvirt. Red Hat then provides support software that uses this library to take the place of the virtualization package-specific control software.[22] (For information on the management end of libvirt, (For information on the management end of libvirt, virt-manager virt-manager, see Chapter6 Chapter6.) For example, Red Hat includes virsh virsh, a command-line interface that controls virtual machines. xm xm and and virsh virsh do much the same thing, using very similar commands. The advantage of do much the same thing, using very similar commands. The advantage of virsh virsh and libvirt, however, is that the and libvirt, however, is that the virsh virsh interface will remain consistent if you decide to switch to another virtualization technology. Right now, for example, it can control QEMU and KVM in addition to Xen using a consistent set of commands. interface will remain consistent if you decide to switch to another virtualization technology. Right now, for example, it can control QEMU and KVM in addition to Xen using a consistent set of commands.The installation component of this system is virt-install virt-install. Like virsh virsh, it builds on libvirt, which provides a platform-independent wrapper around different virtualization packages. No matter which virtualization backend you're using, virt-install virt-install works by providing an environment for the standard network install method: First it asks the user for configuration information, then it writes an appropriate config file, makes a virtual machine, loads a kernel from the install medium, and finally bootstraps a network install using the standard Red Hat installer, works by providing an environment for the standard network install method: First it asks the user for configuration information, then it writes an appropriate config file, makes a virtual machine, loads a kernel from the install medium, and finally bootstraps a network install using the standard Red Hat installer, anaconda anaconda. At this point anaconda anaconda takes over, and installation proceeds as normal. takes over, and installation proceeds as normal.Unfortunately, this means that virt-install virt-install only works with network-accessible Red Hatstyle directory trees. (Other distros don't have the install layout that the installer expects.) If you're planning to standardize on Red Hat, CentOS, or Fedora, this is okay. Otherwise, it could be a serious problem. only works with network-accessible Red Hatstyle directory trees. (Other distros don't have the install layout that the installer expects.) If you're planning to standardize on Red Hat, CentOS, or Fedora, this is okay. Otherwise, it could be a serious problem.Although virt-install virt-install is usually called from within Red Hat's is usually called from within Red Hat's virt-manager virt-manager GUI, it's also an independent executable that you can use manually in an interactive or scripted mode. Here's a sample GUI, it's also an independent executable that you can use manually in an interactive or scripted mode. Here's a sample virt-install virt-install session, with our inputs in bold. session, with our inputs in bold.#/usr/sbin/virt-install Wouldyoulikeafullyvirtualizedguest(yesorno)?Thiswillallowyouto rununmodifiedoperatingsystems.no Whatisthenameofyourvirtualmachine?donalbain HowmuchRAMshouldbeallocated(inmegabytes)?512 Whatwouldyouliketouseasthedisk(path)?/mnt/donalbain.img Howlargewouldyoulikethedisk(/mnt/donalbain.img)tobe(ingigabytes)?4 Wouldyouliketoenablegraphicssupport?(yesorno)no Whatistheinstalllocation?ftp://mirrors.easynews.com/linux/centos/4/os/i386/ Most of these inputs are self-explanatory. Note that the install location can be ftp:// ftp://, http:// http://, nfs nfs:, or an SSH-style path (:/path). All of these can be local if necessary-a local FTP or local HTTP server, for example, is a perfectly valid source. Graphics support Graphics support indicates whether to use the virtual framebuffer-it tweaks the indicates whether to use the virtual framebuffer-it tweaks the vfb= vfb= line in the config file. line in the config file.Here's the config file generated from that input: name="donalbain"memory="512"disk=['tap:aio:/mnt/donalbain.img,xvda,w',]vif=['mac=00:16:3e:4b:af:c2,bridge=xenbr0',]uuid="162910c8-2a0c-0333-2349-049e8e32ba90"bootloader="/usr/bin/pygrub"vcpus=1 on_reboot='restart'on_crash='restart'There are some niceties about virt-install virt-install's config file that we'd like to mention. First, note that virt-install virt-install accesses the disk image using the tap driver for improved performance. (For more details on the tap driver, see accesses the disk image using the tap driver for improved performance. (For more details on the tap driver, see Chapter4 Chapter4.) It also exports the disk as xvda xvda to the guest operating system, rather than as a SCSI or IDE device. The generated config file also includes a randomly generated MAC for each to the guest operating system, rather than as a SCSI or IDE device. The generated config file also includes a randomly generated MAC for each vif vif, using the 00:16:3e 00:16:3e prefix a.s.signed to Xen. Finally, the image boots using PyGRUB, rather than specifying a kernel within the config file. prefix a.s.signed to Xen. Finally, the image boots using PyGRUB, rather than specifying a kernel within the config file.[22] There's nothing There's nothing inherently inherently Red Hatspecific about libvirt, but Red Hat is currently driving its adoption. See Red Hatspecific about libvirt, but Red Hat is currently driving its adoption. See http://libvirt.org/ for more information. for more information.Converting VMware Disk Images One of the great things about virtualization is that it allows people to distribute virtual appliances virtual appliances-complete, ready-to-run, preconfigured OS images. VMware has been pushing most strongly in that direction, but with a little work, it's possible to use VMware's prebuilt virtual machines with Xen.PYGRUB, PYPXEBOOT, AND FRIENDSThe principle behind PyGRUB, pypxeboot, and similar programs is that they allow Xen's domain builder to load a kernel that isn't directly accessible from the dom0 filesystem. This, in turn, improves Xen's simulation of a real machine. For example, an automated provisioning tool that uses PXE can provision Xen domains without modification. This becomes especially important in the context of domU images because it allows the image to be a self-contained package-plop a generic config file on top, and it's ready to go.Both PyGRUB and pypxeboot take the place of an a.n.a.logous utility for physical machines: GRUB and PXEboot, respectively. Both are emulations written in Python, specialized to work with Xen. Both acquire the kernel from a place where the ordinary loader would be unable to find it. And both can help you, the hapless Xen administrator, in your day-to-day life.For more notes on setting up PyGRUB, see Chapter7 Chapter7. For more on pypxeboot, see Installing pypxeboot Installing pypxeboot on on Installing pypxeboot Installing pypxeboot.Other virtualization providers, by and large, use disk formats that do more than Xen's-for example, they include configuration or provide snapshots. Xen's approach is to leave that sort of feature to standard tools in the dom0. Because Xen uses open formats and standard tools whenever possible, its disk images are simply ... filesystems.[23]Thus, the biggest part of converting a virtual appliance to work with Xen is in converting over the disk image. Fortunately, qemu-img qemu-img supports most of the image formats you're likely to encounter, including VMware's supports most of the image formats you're likely to encounter, including VMware's .vmdk .vmdk, or Virtual Machine Disk format.The conversion process is pretty easy. First, get a VMware image to play with. There are some good ones at http://www.vmware.com/appliances/directory/.Next, take the image and use qemu-img qemu-img to convert it to a QCOW or raw image: to convert it to a QCOW or raw image: #qemu-imgconvertfoo.vmdk-oqcowhecate.qcow This command duplicates the contents of foo.vmdk foo.vmdk in a QCOW image (hence the in a QCOW image (hence the -o qcow -o qcow, for output format) called hecate.qcow hecate.qcow. (QCOW, by the way, is a disk image format that originates with the QEMU emulator. It supports AES encryption and transparent decompression. It's also supported by Xen. More details on using QCOW images with Xen are in Chapter4 Chapter4.) At this point you can boot it as usual, loading the kernel via PyGRUB if it's Xen-aware or if you're using HVM, or using a standard domU kernel from within the dom0 otherwise.Unfortunately, this won't generate a configuration suitable for booting the image with Xen. However, it should be easy to create a basic config file that uses the QCOW image as its root device. For example, here's a fairly minimal generic config that relies on the default values to the extent possible: name="hecate"memory=128 disk=['tap:qcow:/mnt/hecate.img,xvda,w']vif=['']kernel="/boot/vmlinuz-2.6-xenU"Note that we're using a kernel from the dom0 filesystem rather than loading the kernel from the VMware disk image with PyGRUB, as we ordinarily suggest. This is so we don't have to worry about whether or not that kernel works with Xen.RPATH'S RBUILDER: A NEW APPROACHRPath is kind of interesting. It probably doesn't merit extended discussion, but their approach to building virtual machines is cool. Neat. Elegant.RPath starts by focusing on the application that the machine is meant to run and then uses software that determines precisely what the machine needs to run it by examining library dependencies, noticing which config files are read, and so on. The promise of this approach is that it delivers compact, tuned, refined virtual machine images with known characteristics-all while maintaining the high degree of automation necessary to manage large systems.Their website is http://rpath.org/. They've got a good selection of prerolled VMs, aimed at both testing and deployment. (Note that although we think their approach is worth mentioning, we are not affiliated with rPath in any way. You may want to give them a shot, though.)