UbuntuKerrighedClusterGuide

Differences between revisions 6 and 7
Revision 6 as of 2009-05-03 10:31:25
Size: 16142
Editor: i-195-137-4-33
Comment:
Revision 7 as of 2009-06-07 13:42:40
Size: 16141
Editor: ABTS-TN-dynamic-039
Comment:
Deletions are marked like this. Additions are marked like this.
Line 112: Line 112:
$ sudo cp /boot/vimlinuz-<KERNEL_VERSION> /boot/initrd.img-<KERNEL_VERSION> /srv/tftp/ $ sudo cp /boot/vmlinuz-<KERNEL_VERSION> /boot/initrd.img-<KERNEL_VERSION> /srv/tftp/

Setting Up A Diskless-boot Kerrighed 2.3.0 Cluster in Ubuntu 8.04

Created by BigJimJams on 26/02/2009.

Ok, here's the situation, you've got a number of old machines laying around and your wondering what to do with them. Someone says 'Why don't you build a cluster?' and you think not a bad idea, but it sounds difficult! Actually, it's not that bad! :-D After a little searching I came across Kerrighed, which will allow a number of machines to be seen as one large SMP machine. However, I couldn't find any documentation for a Ubuntu setup of Kerrighed, so I decided to piece together this guide from various other guides (Kerrighed installation, Kerrighed On NFS, Alternative Kerrighed on NFS, Kerrighed and DRBL, Ubuntu diskless configuration) which all seemed to do it slightly differently or missed out the occasional step.

Here is the setup used in this guide. We're using five machines: one is going to be setup as the server, which can run Ubuntu 8.04 with any kernel, as it doesn't use Kerrighed, and the remaining four machines will use the kerrighed kernel and diskless boot off the server. Our server has two network cards, one is setup for internet access, the other is connected to a switch, which connects the four nodes. The server network card connected to the switch is manually configured with the IP address 192.168.1.1 and subnet mask 255.255.255.0.

This guide is split into two parts: the first covers how to setup the server for diskless booting the nodes using the current kernel, the second part of the guide covers setting up Kerrighed 2.3.0 and incorporating it into the diskless boot configuration of part one.

Part 1: Setting up a diskless boot Ubuntu server

I know there are already many guides out there for doing this in Ubuntu, but I thought it's easier to include what I did here so you can see the whole process. In order to get a working diskless boot server, there are four main components: a DHCP server, to assign IP addresses to each node, a TFTP server, to boot the kernel for each node over the network, a NFS server, to allow each node to share a filesystem, and a minimal Ubuntu 8.04 installation, which all the nodes share.

1.1: Setting up the DHCP server

  • Install the DHCP server package

$ sudo apt-get install dhcp3-server
  • Check /etc/default/dhcp3-server contains the correct ethernet card to listen to requests. In our case eth0:

# /etc/default/dhcp3-server #
interfaces="eth0"
  • Edit /etc/dhcp3/dhcpd.conf so it looks like the following:

# /etc/dhcp3/dhcpd.conf #
# General options
option dhcp-max-message-size 2048;
use-host-decl-names on;
deny unknown-clients;
deny bootp;

# DNS settings
option domain-name "kerrighed";          # Just an example name, call it whatever you want tp.
option domain-name-servers 192.168.1.1;  # The ip address of the dhcp/tftp/nfs server.

# Information about the network setup
subnet 192.168.1.0 netmask 255.255.255.0 {
  option routers 192.168.1.1;              # IP addreess of the dhcp/tftp/nfs server.
  option broadcast-address 192.168.1.255;  # Broadcast address for your network.
}

# Declaring IP addresses for nodes and PXE info
group {
  filename "pxelinux.0";                 # location of PXE bootloader. Path is relative to tftpd's root(/srv/tftp/)
  option root-path "192.168.1.1:/nfsroot/kerrighed";  # Location of the bootable filesystem on NFS server

  host kerrighednode1 {
        fixed-address 192.168.1.101;          # IP address for kerrighednode1.
        hardware ethernet 01:2D:61:C7:17:86;  # MAC address of the kerrighednode1's ethernet adapter
  }

  host kerrighednode2 {
        fixed-address 192.168.1.102;          # IP address for kerrighednode2.
        hardware ethernet 01:2D:61:C7:17:87;  # MAC address of the kerrighednode2's ethernet adapter
  }

  host kerrighednode3 {
        fixed-address 192.168.1.103;          # IP address for kerrighednode3.
        hardware ethernet 01:2D:61:C7:17:88;  # MAC address of the kerrighednode3's ethernet adapter
  }
  host kerrighednode4 {
        fixed-address 192.168.1.104;          # IP address for kerrighednode4.
        hardware ethernet 01:2D:61:C7:17:89;  # MAC address of the kerrighednode4's ethernet adapter
  }

  server-name "kerrighedserver"; # Name of the PXE server
  next-server 192.168.1.1;       # The IP address of the dhcp/tftp/nfs server
}

1.2: Setting up the TFTP Server and PXE bootloader

  • Install the packages for the TFTP server

$ sudo apt-get install tftpd-hpa
  • Open /etc/default/tftpd-hpa and make sure it looks like the following:

# /etc/default/tftp-hpa #
#Defaults for tftp-hpa
RUN_DAEMON="YES"
OPTIONS="-l -s /srv/tftp"
  • Open /etc/inetd.conf and comment out the current tftp line and add the following:

service tftp
{
        disable         = no
        id              = chargen-dgram
        socket_type     = dgram
        protocol        = udp
        user            = root
        wait            = yes
        server          = /usr/sbin/in.tftpd
        server_args     = -s /srv/tftp/
}
  • Install syslinux and copy the PXE bootloader code to the tftp server directory.

$ sudo apt-get install syslinux
$ cp /usr/lib/syslinux/pxelinux.0 /srv/tftp/
  • Create a directory to store the default configuration for all the nodes

$ sudo mkdir /srv/tftp/pxelinux.cfg
  • Copy your current kernel and initrd from /boot to /srv/tftp/ in order to test the diskless-boot system. Replace <KERNEL_VERSION> with whatever you are using.

$ sudo cp /boot/vmlinuz-<KERNEL_VERSION> /boot/initrd.img-<KERNEL_VERSION> /srv/tftp/
  • Create the file /srv/tftp/pxelinux.cfg/default and add the following default configuration for the nodes to boot

LABEL linux
KERNEL vmlinuz-<KERNEL_VERSION>
APPEND root=/dev/nfs initrd=initrd.img-<KERNEL_VERSION> nfsroot=192.168.1.1:/nfsroot/kerrighed ip=dhcp rw

1.3: Setting up the NFS Server

  • Install the packages for the NFS server

$ sudo apt-get install nfs-kernel-server nfs-common
  • Make a directory to store the bootable filesystem

$ mkdir /nfsroot/kerrighed
  • Edit /etc/exports by adding the following in order to export the the client’s root filesystem:

# /etc/exports #
/nfsroot/kerrighed 192.168.1.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash)
  • Re-export the file systems

$ exportfs -avr

1.4: Setting up the bootable filesystem

  • Install the packages needed and install the base system to the bootable filesystem folder. In this case its a minimal install of Ubuntu Hardy.

$ sudo apt-get install debootstrap
debootstrap --arch i386 hardy /nfsroot/kerrighed http://archive.ubuntu.com/ubuntu/
  • Change the current root of the file system to the bootable filesystem directory (stay chrooted until the guide tells you otherwise)

$ sudo chroot /nfsroot/kerrighed
  • Set the root password.

$ passwd
  • Mount the /proc directory of the current machine

$ mount -t proc none /proc
  • Edit /etc/apt/sources.list in order to download the necessary packages

deb http://archive.canonical.com/ubuntu hardy partner
deb http://archive.ubuntu.com/ubuntu/ hardy main universe restricted multiverse
deb http://security.ubuntu.com/ubuntu/ hardy-security universe main multiverse restricted
deb http://archive.ubuntu.com/ubuntu/ hardy-updates universe main multiverse restricted
deb-src http://archive.ubuntu.com/ubuntu/ hardy main universe restricted multiverse
deb-src http://security.ubuntu.com/ubuntu/ hardy-security universe main multiverse restricted
deb-src http://archive.ubuntu.com/ubuntu/ hardy-updates universe main multiverse restricted
  • Update the current package listing

$ apt-get update
  • Install the packages that our nodes need for the dhcp/nfs

$ apt-get install dhcp3-common nfs-common nfsbooted openssh-server
  • Edit /etc/fstab of the bootable filesystem to look like this:

# /etc/fstab
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc            /proc         proc   defaults       0      0
/dev/nfs        /             nfs    defaults       0      0
  • Edit /etc/hosts and add all cluster nodes and server to it. In our case it would look like the following:

# /etc/hosts #
127.0.0.1 localhost

192.168.1.1    kerrighedserver
192.168.1.101  kerrighednode1
192.168.1.102  kerrighednode2
192.168.1.103  kerrighednode3
192.168.1.104  kerrighednode4
  • Do the following to create a symlink to automount the nfs shared filesystem at startup. This should not collide with other existing services e.g./etc/rcS.d/S35xxxxxxx

$ ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs 
  • Edit /etc/network/interfaces and disable the network manger from managing the nodes ethernet cards, as it can cause issues with NFS. Ours looks like the following:

# Used by ifup(8) and ifdown(8). See the interfaces(5) manpage or
# /usr/share/doc/ifupdown/examples for more information.

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface, commented out for NFS root
iface eth0 inet manual
  • Create a user for the bootable system. Replace <username> with whatever you want.

$ adduser <username>
  • Ensure the new user is in the /etc/sudoers file:

# /etc/sudoers #
#User privilege specification
root ALL=(ALL) ALL
<username> ALL=(ALL) ALL
  • Exit from the chrooted bootable filesystem

$ exit

1.5: Testing the diskless boot system

  • Restart the servers

$ sudo /etc/init.d/tftpd-hpa restart
$ sudo /etc/init.d/dhcp3-server restart
$ sudo /etc/init.d/nfs-kernel-server restart
  • Configure the bios of each node to boot over the network - remember also to disable "halt on all errors"
  • Boot each of the nodes to see if it works. If so, you should be presented with a login prompt, where you can log-in using the username you defined earlier.

Part 2: Setting up Kerrighed

Now that we've got a diskless boot system setup, we only need to build the Kerrighed kernel for the nodes to use and configure the Kerrighed settings in order to have a working SSI (Single System Image) cluster.

2.1: Building the Kerrighed kernel

  • Shutdown the nodes and on the server, once again chroot into the bootable filesystem.

$ sudo chroot /nfsroot/kerrighed
  • Install the necessary packages into the bootable filesystem.

$ apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 gcc-3.3 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential
  • Get latest kerrighed sources on INRIA's GForge and the vanilla 2.6 kernel.

$ wget -O /usr/src/kerrighed-latest.tar.gz http://kerrighed.gforge.inria.fr/kerrighed-latest.tar.gz
$ wget -O /usr/src/linux-2.6.20.tar.bz2 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2
  • Change to the /usr/src directory where the downloaded tarballs are and decompress them.

$ cd /usr/src
$ tar zxf kerrighed-latest.tar.gz
$ tar jxf linux-2.6.20.tar.bz2
  • Change to the kerrighed sources directory, configure the sources and patch the 2.6 kernel

$ cd kerrighed-x.x.x/
$ ./configure --with-kernel=/usr/src/linux-2.6.20 CC=gcc-3.3
$ make patch
  • Change to the kernel sources directory, create the default config for the kernel, and then configure the kernel. Make sure that the NFS and network card driver are compiled in the kernel and not as modules, since we will not use initrd.

$ cd ../linux-2.6.20
$ make defconfig
$ make menuconfig
  • Change back to the kerrighed sources directory, and execute make for the kernel and the kerrighed tools.

$ cd ../kerrighed-x.x.x
$ make kernel
$ make
  • If everything goes well, kerrighed will have built the modules, the kernel and the tools correctly. Now it's the time to install them

$ make kernel-install
$ make install
  • If everything has been correctly installed, you should have the following in the bootable filesystem:

/boot/vmlinuz-2.6.20-krg
/boot/System.map
/lib/modules/2.6.20-krg
/etc/init.d/kerrighed
/etc/default/kerrighed
/usr/local/share/man
/usr/local/bin/krgadm
/usr/local/bin/krgcapset
/usr/local/bin/migrate
/usr/local/lib/libkerrighed-*
/usr/local/include/kerrighed

2.2: Configuring Kerrighed

  • Edit the /etc/kerrighed_nodes file to define the session ID for all nodes of the cluster, and the number of nodes that have to be available before the cluster autostarts. It should look like the following:

# /etc/kerrighed_nodes #
session=1  #Value can be 1 - 254 
nbmin=0    #Number of nodes which load before kerrighed autostarts. Only safe value is 0 due to a bug.
  • Check /etc/default/kerrighed contains the following so the kerrighed service is loaded and started:

# /etc/default/kerrighed #
# If true, enable Kerrighed module loading
ENABLE=true
  • Exit the chrooted bootable filesystem

$ exit
  • Now we must reconfigure the TFTP boot configuration we set up earlier to use the kerrighed kernel. First, copy the new kerrighed kernel to the tftp directory.

$ cp /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /srv/tftp
  • Edit /srv/tftp/pxelinux.cfg/default so it boots the kerrighed kernel. It should looks like:

LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=192.168.1.1:/nfsroot/kerrighed ip=dhcp rw
  • Ok, now all the configuring is done its time to restart the servers again.

$ sudo /etc/init.d/tftpd-hpa restart
$ sudo /etc/init.d/dhcp3-server restart
$ sudo /etc/init.d/nfs-kernel-server restart
  • Time to see if it works! Once again boot up all of the nodes, and if the login prompt appears, the new kernel has booted fine. Login into one of the nodes (either by ssh or on the node itself).

2.3: Starting Kerrighed

  • You can check if all nodes in the cluster are up and running by typing:

$ sudo krgadm nodes
  • You should see a list of all nodes in the format node_id:session. In our case we should see the following:

101:1 102:1 103:1 104:1
  • To start the kerrighed cluster type

$ sudo krgadm cluster start
  • To see if the cluster is running type the following (currently in kerrighed 2.3.0 this always returns telling you no cluster is running, when it actaully is):

$ sudo krgadm cluster status
  • To list the process capabilites for the kerrighed cluster type:

$ sudo krgcapset -s
  • To allow process migration to take place between nodes in the cluster type the following:

$ sudo krgcapset -d +CAN_MIGRATE
  • Hopefully, by this point your kerrighed cluster is working nicely. To see if it is working try running top from the command line and pressing 1 to list all the CPUs in the cluster. Don't worry if they have strange IDs, as its to do with the Kerrighed autonumbering implementation. You can also check the process migration is working from top by starting a number of long running cpu intensive processes, and seeing if all CPUs listed reach 100% usage.

I hope this guide has helped some of you anyway. If you have any comments, suggestions or improvements I'd like to hear from you. If you add it to the Easy Ubuntu Clustering forum http://ubuntuforums.org/showthread.php?p=6495259 it could be useful to other people too.

EasyUbuntuClustering/UbuntuKerrighedClusterGuide (last edited 2012-01-21 10:02:23 by vpn-3091)