Setting Up A Diskless-boot Kerrighed 2.4.1 Cluster in Ubuntu 8.04

Created by BigJimJams on 26/02/2009. Updated/adapted by Alicia Mason (RINH) 22.09.09

OK, here's the situation - you've got a number of old machines lying around and you're wondering what to do with them. Someone says 'Why don't you build a cluster?' and you think, not a bad idea, but it sounds difficult! Actually, it's not that bad! :-D After a little searching I came across Kerrighed, which will allow a number of machines to be seen as one large SMP machine. However, I couldn't find any documentation for an Ubuntu setup of Kerrighed, so I decided to piece together this guide from various other guides (Kerrighed installation, Kerrighed On NFS, Alternative Kerrighed on NFS, Kerrighed and DRBL, Ubuntu diskless configuration), which all seemed to do it slightly differently or missed out the occasional step.

Here is the setup used in this guide. We're using seven machines: one is going to be setup as the server, which can run Ubuntu 8.04 with any kernel, as it doesn't necessarily use Kerrighed; the remaining six machines will use the Kerrighed kernel and diskless boot off the server. Our server has two network cards - one is set up for Internet access and the other is connected to a switch, which connects the four nodes. The server network card connected to the switch is manually configured with the IP address and subnet mask

This guide is split into two parts: the first covers how to setup the server for diskless booting the nodes using the current kernel, the second part of the guide covers setting up Kerrighed 2.4.1 and incorporating it into the diskless boot configuration of part one.

Preliminary check

Before you start, make sure you're ready to start setting up the cluster: get your nodes and server built if necessary and make sure they have the correct hardware. Connect the cluster network up: each node is linked to a port on the switch, as is the server's secondary NIC. Its primary NIC is connected to your general LAN, home router, wall port or whatever you use to access the Net.

The machine that will be your server needs an operating system, too. Any flavour of Ubuntu will do - at the RINH we've been using the NEBC's Bio-Linux 5, but you could use whatever spin you want. If you're new it's best to stick with main Ubuntu releases. Set this machine up with your chosen *buntu and make sure it can connect to the Internet - if you're at an institution, you'll probably need information about the institution's DNS nameservers, domains etc. before you can connect properly.

Once you're sure everything worked before you started messing with it, you're ready to start.

Part 1: Setting up a diskless boot Ubuntu server

This will be the basis of our cluster. In order to get a working diskless boot server, there are four main components you'll need to install: a DHCP server to assign IP addresses to each node, a TFTP server to boot the kernel for each node over the network, an NFS server to allow the nodes to share a filesystem, and a minimal Ubuntu 8.04 installation for them to share.

These server components will all run on one box; this box will be your 'head node' or the controller for the cluster. Let's get started!

1.1: Setting up the DHCP server

DHCP is what will allow the nodes to get IP addresses from the server. We want to set up and configure a DHCP daemon to run on the server and give IP addresses only to nodes it recognises, so we will tell the daemon their MAC addresses.

# aptitude install dhcp3-server

# /etc/default/dhcp3-server #

# /etc/dhcp3/dhcpd.conf #
# General options
option dhcp-max-message-size 2048;
use-host-decl-names on;
deny unknown-clients; # This will stop any non-node machines from appearing on the cluster network.
deny bootp;

# DNS settings
option domain-name "kerrighed";          # Just an example name - call it whatever you want.
option domain-name-servers;  # The server's IP address, manually configured earlier.

# Information about the network setup
subnet netmask {
  option routers;              # Server IP as above.
  option broadcast-address;  # Broadcast address for your network.

# Declaring IP addresses for nodes and PXE info
group {
  filename "pxelinux.0";                 # PXE bootloader. Path is relative to /var/lib/tftpboot
  option root-path "";  # Location of the bootable filesystem on NFS server

  host kerrighednode1 {
        fixed-address;          # IP address for the first node, kerrighednode1 for example.
        hardware ethernet 01:2D:61:C7:17:86;  # MAC address of the node's ethernet adapter

  host kerrighednode2 {
        hardware ethernet 01:2D:61:C7:17:87;

  host kerrighednode3 {
        hardware ethernet 01:2D:61:C7:17:88;
  host kerrighednode4 {
        hardware ethernet 01:2D:61:C7:17:89;
  host kerrighednode5 {
        hardware ethernet 01:2D:61:C7:17:90;
  host kerrighednode6 {
        hardware ethernet 01:2D:61:C7:17:91;

  server-name "kerrighedserver"; # Name of the server. Call it whatever you like.
  next-server;       # Server IP, as above.

Now you're done configuring DHCP, so your nodes will be able to get IPs. It's time to add the functionality that will allow the server to transfer a kernel to them afterwards.

1.2: Setting up the TFTP server and PXE bootloader

TFTP is the fileserver that will be used by the bootloader to transfer the kernel to your nodes during a PXE boot. We need to install a TFTP server and get a PXE bootloader as part of the syslinux package, so that our nodes will be able to get their operating systems via the cluster server.

# aptitude install tftpd-hpa

# /etc/default/tftp-hpa #
#Defaults for tftp-hpa
OPTIONS="-l -s /var/lib/tftpboot"

tftp           dgram   udp     wait    root  /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot

# aptitude install syslinux
# cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot

# mkdir /var/lib/tftpboot/pxelinux.cfg

# cp /boot/vmlinuz-<KERNEL_VERSION> /boot/initrd.img-<KERNEL_VERSION> /var/lib/tftpboot/

LABEL linux
APPEND console=tty1 root=/dev/nfs initrd=initrd.img-<KERNEL_VERSION> nfsroot= ip=dhcp rw

You're done setting up the TFTP and PXE components of the cluster server! Your nodes will now be able to get a kernel and filesystem from the server after they're given IP addresses. Now you need to add NFS capability.

1.3: Setting up the NFS server

This capability allows for the bootable filesystem that the nodes will download over TFTP to be accessed and shared over the network, so that the cluster uses one filesystem. First we'll install and set up the server that will do this.

# apt-get install nfs-kernel-server nfs-common

# mkdir /nfsroot/kerrighed

# /etc/exports #

# exportfs -avr

Your NFS server should be up and running. Now you can add a filesystem for this server to work with.

1.4: Setting up the bootable filesystem

# aptitude install debootstrap
# debootstrap --arch i386 hardy /nfsroot/kerrighed

# chroot /nfsroot/kerrighed

# passwd

# mount -t proc none /proc

deb hardy partner
deb hardy main universe restricted multiverse
deb hardy-security universe main multiverse restricted
deb hardy-updates universe main multiverse restricted
deb-src hardy main universe restricted multiverse
deb-src hardy-security universe main multiverse restricted
deb-src hardy-updates universe main multiverse restricted

]# aptitude update

$ apt-get install dhcp3-common nfs-common nfsbooted openssh-server

# /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
proc            /proc         proc   defaults       0      0
/dev/nfs        /             nfs    defaults       0      0

# /etc/hosts # localhost    kerrighedserver  kerrighednode1  kerrighednode2  kerrighednode3  kerrighednode4  kerrighednode5  kerrighednode6

$ ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs 

# ...
# The loopback interface:
auto lo
iface lo inet loopback

# The primary network interface, manually configured to protect NFS:
iface eth0 inet manual

# adduser <username>

# /etc/sudoers #
#User privilege specification
root ALL=(ALL) ALL
<username> ALL=(ALL) ALL

# exit
# exit

1.5: Testing the diskless boot system sans Kerrighed

# /etc/init.d/tftpd-hpa restart
# /etc/init.d/dhcp3-server restart
# /etc/init.d/nfs-kernel-server restart

Part 2: Setting up Kerrighed

Now that we've got a diskless boot system setup to use as a server, we only need to build the Kerrighed kernel for the nodes to use, put it in the bootable FS, and configure the Kerrighed settings properly in order to have a working SSI (Single System Image) cluster.

The first thing to do is build the new kernel itself.

2.1: Building the Kerrighed kernel

# chroot /nfsroot/kerrighed

$ apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 gcc-3.3 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential openssh-server ntp

# wget -O /usr/src/kerrighed-2.4.1.tar.gz
# wget -O /usr/src/linux-2.6.20.tar.bz2

# cd /usr/src
# tar zxf kerrighed-2.4.1.tar.gz
# tar jxf linux-2.6.20.tar.bz2

# cd /usr/src/kerrighed-2.4.1/modules
# vi Makefile

# cd ..
# ./configure --with-kernel=/usr/src/linux-2.6.20 CC=gcc-3.3
# cd kernel
# make defconfig
# make menuconfig

# make kernel
# make
# make kernel-install
# make install
# ldconfig

/boot/vmlinuz-2.6.20-krg (Kerrighed kernel)
/boot/ (Kernel symbol table)
/lib/modules/2.6.20-krg (Kerrighed kernel module)
/etc/init.d/kerrighed (Kerrighed service script)
/etc/default/kerrighed (Kerrighed service configuration file)
/usr/local/share/man/* (Look inside these subdirectories for Kerrighed man pages)
/usr/local/bin/krgadm (The cluster administration tool)
/usr/local/bin/krgcapset (Tool for setting capabilities of processes on the cluster)
/usr/local/bin/krgcr-run (Tool for checkpointing processes)
/usr/local/bin/migrate (Tool for migrating processes)
/usr/local/lib/libkerrighed-* (Libraries needed by Kerrighed)
/usr/local/include/kerrighed (Headers for Kerrighed libraries)

# mkdir /config

configfs        /config         configfs        defaults        0 0

title           Ubuntu 8.04.3 LTS, kernel 2.6.20 + Kerrighed 2.4.1
root            (hd0,0)
kernel          /boot/vmlinuz-2.6.20-krg root=/dev/sda1 ro quiet splash session_id=1 node_id=1

Whew. Now that Kerrighed is built and installed, you can configure it to work with your cluster.

2.2: Configuring Kerrighed

# /etc/kerrighed_nodes #
session=1  #Value can be 1 - 254 
nbmin=6    #Number of nodes which load before kerrighed autostarts.

# /etc/default/kerrighed #
# If true, enable Kerrighed module loading

# exit

# cp /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /var/lib/tftpboot/

LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot= ip=dhcp rw

# /etc/init.d/tftpd-hpa restart
# /etc/init.d/dhcp3-server restart
# /etc/init.d/nfs-kernel-server restart

2.3: Starting Kerrighed

# krgadm nodes

101:1 102:1 103:1 104:1 105:1 106:1

# krgadm cluster start

# krgadm cluster status

# krgcapset -s

# krgcapset -d +CAN_MIGRATE

# krgcapset -k $$ -d +CAN_MIGRATE

I hope this guide has helped some of you, anyway. If you have any comments, suggestions or improvements I'd like to hear from you. If you add it to the Easy Ubuntu Clustering forum it could be useful to other people too.

(AM) I hope everyone has fun with their clusters - gute Chance!

EasyUbuntuClustering/UbuntuKerrighedClusterGuide (last edited 2012-01-21 10:02:23 by jengelh)