Beowulf Clustering using Linux Debian 5.0 (updated)

Original guide published here on October 8, 2009 on

The updated guide was originally posted on March 27, 2010 on this site:

Just like the original guide, I decided I was going to preserve it here as well.

May 24, 2011

This guide is an evolution from this original guide. Unless the Kerrighed Team comes up with a substantially different version, this is the only update to this guide I will ever make as the steps are pretty much the same for all svn versions I have tested.

On this version:

- Added changes for the latest Kerrighed svn 5586

- Fixed some steps to make them more readable and error free.

- Added simple MPI example to see how your program interacts with the cluster.

- Added troubleshooting section for some situations in which the nodes do not receive the image from the controller.

Thank you all for your previous comments and emails.

Rodrigo Sarpi

| internet |
| eth1 — controller: (given by router1)|
| eth0 — controller: (manually set) |
| |
| + –>eth0–node1: (static IP Address)
eth0–node2: (static IP Address)


Debian Lenny with default kernel 2.6.26-2-686

All steps done as root on the controller


Step 1:

- dhcp server will provide ip addresses to the nodes.

- tftpd-hpa will deliver the image to the nodes

- portmap converts RPC (Remote Procedure Call) program numbers into port numbers.

NFS uses that to make RPC calls.

- syslinux is a boot loader for Linux which simplifies first-time installs

- nfs will be used to export directory structures to the nodes

When installing these packages accept the default settings presented for dhcp3 and TFTP.

#apt-get install dhcp3-server tftpd-hpa portmap syslinux nfs-kernel-server nfs-common

These packages are for MPI (see under TESTING below). You can install them on the controller to compile your MPI programs, then move them to any of the nodes and start the program from the node; or you can create, compile, and execute your MPI programs on any of the nodes. Either way, you need these packages on the node to execute your MPI code no matter option you choose:

#apt-get install openmpi-bin openmpi-common libopenmpi1 libopenmpi-dev


Step 2:

Identify ethernet interfaces which will be used by the dhcp server.

For this setup, we are setting up “eth0? as the network card that’s

feeding the nodes of the internal network.

#nano /etc/default/dhcp3-server



Step 3:

General configuration for the DHCP server.

Make a backup of original configuration file in case you want to use it as a reference later on.

cat /etc/dhcp3/dhcpd.conf > /etc/dhcp3/dhcpd.conf.bkp

#nano /etc/dhcp3/dhcpd.conf

# General options

option dhcp-max-message-size 2048;

use-host-decl-names on;

deny unknown-clients;

deny bootp;

# DNS settings

option domain-name “nibiru_system”; # any name will do

option domain-name-servers; # server’s IP address: dhcp and tftp

# network

subnet netmask {

option routers; # server IP as above.

option broadcast-address; # broadcast address


# ip addresses for nodes

group {

filename “pxelinux.0?; # PXE bootloader in /var/lib/tftpboot

option root-path “”; # bootable system

#the other laptop

host node1 {

fixed-address; # first node

hardware ethernet 00:0B:DB:1B:E3:89;



host node2 {


hardware ethernet 00:16:76:C1:F7:D4;


server-name “nibiru_headnode”; # Any name will do

next-server; # Server IP where the image is. For this network it’s the same machine



Step 4:

Configure the TFTP server.

#nano /etc/default/tftpd-hpa


OPTIONS=”-l -s /var/lib/tftpboot”


Step 5:

Configure inetd for TFTP server.

nano /etc/inetd.conf

tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot


Step 6:

This directory will hold the image for the nodes to boot from.

#mkdir /var/lib/tftpboot/pxelinux.cfg


Step 7:

Copy PXE bootloader to the TFTP server.

#cp -p /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot/


Step 8:

Fallback configuration. If the TFTP client cannot find a PXE bootload configuration

for a specific node, it will use this one.

#nano /var/lib/tftpboot/pxelinux.cfg/default

LABEL linux

KERNEL vmlinuz-2.6.20-krg

APPEND console=tty1 root=/dev/nfs nfsroot= ip=dhcp rw session_id=1


Step 9:

This step is optional but recommended.

In /var/lib/tftpboot/pxelinux.cfg create separate files for *each* node.

The filename should be the IP address of the node represented in HEX format.

Example: 10 –> A; 11 –>B; 12 –>C; 101 –>65.

So for it should be 0A0B0C65.

#nano /var/lib/tftpboot/pxelinux.cfg/0A0B0C65

LABEL linux

KERNEL vmlinuz-2.6.20-krg

APPEND console=tty1 root=/dev/nfs nfsroot= ip= rw session_id=1


Step 10:

Future node system. This directory will have the node’s bootable files, etc.

#mkdir /nfsroot/ && mkdir /nfsroot/kerrighed


Step 11:

Tell NFS what to export

#nano /etc/exports



Step 12:

Tell NFS to export above file system

#exportfs -avr


Step 13:

Create bootable system.

some developers reported that they needed the trailing “/” after “kerrighed”

as in: debootstrap –arch i386 lenny /nfsroot/kerrighed/

#apt-get install debootstrap

debootstrap –arch i386 lenny /nfsroot/kerrighed

You should get this output:

I: Retrieving Release

I: Retrieving Packages

I: Validating Packages

I: Resolving dependencies of required packages…

I: Resolving dependencies of base packages…

I: Checking component main on…

I: Retrieving libacl1

I: Validating libacl1


I: Configuring tasksel-data…

I: Configuring tasksel…

I: Base system installed successfully.


Step 14:

Isolate our node system to configure Kerrighed.

#chroot /nfsroot/kerrighed


Step 15:

Set root password for isolated system


Enter new UNIX password: (nibirucluster)

Retype new UNIX password: (nibirucluster)

passwd: password updated successfully


Step 16:

Use the /proc directory of the node’s image as the bootable system’s /proc directory

mount -t proc none /proc


Step 17:

You might get Perl related errors when installing packages on to the node. To suppress those errors, type in the console:

nano .profile

export LC_ALL=C


just copy and paste into console:

export LC_ALL=C


Step 18:

Add basic packages needed by the node to communicate with the controller

nano /etc/apt/sources.list

deb lenny main non-free contrib

deb-src lenny main non-free contrib

deb lenny/updates main

deb-src lenny/updates main

apt-get update

apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential subversion dhcp3-common nfs-common nfsbooted openssh-server

You need these packages on the node to compile and execute your MPI code (see under TESTING below).

apt-get install openmpi-bin openmpi-common libopenmpi1 libopenmpi-dev

libopenmpi-dev may not be required if you only want to execute your code on the node. However, it is needed if you want to compile your program on the node itself.


Step 19:

Preparing mount points

mkdir /config


Step 20:

Set mount points

nano /etc/fstab


proc /proc proc defaults 0 0

/dev/nfs / nfs defaults 0 0

configfs /config configfs defaults 0 0


Step 21

Set hosts to lookup

nano /etc/hosts localhost nibiru_headnode node1 node2


Step 22:

Create a symlink to automount the bootable filesystem.

ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs


Step 23:

Configure network interfaces

nano /etc/network/interfaces

auto lo

iface lo inet loopback

iface eth0 inet manual


Step 24:

The username you will be using to connect to the node.

adduser (clusteruser)

Adding user `clusteruser’ …

Adding new group `clusteruser’ (1000) …

Adding new user `clusteruser’ (1000) with group `clusteruser’ …

Creating home directory `/home/clusteruser’ …

Copying files from `/etc/skel’ …

Enter new UNIX password: (nodepasswd)

Retype new UNIX password: (nodepasswd)

passwd: password updated successfully

Changing the user information for clusteruser

Enter the new value, or press ENTER for the default

Full Name []:

Room Number []:

Work Phone []:

Home Phone []:

Other []:

Is the information correct? [Y/n] y


Step 25

Get latest svn version 5586 as of this writing.

svn checkout svn:// /usr/src/kerrighed -r 5586


A /usr/src/kerrighed/NEWS

A /usr/src/kerrighed/

U /usr/src/kerrighed

Checked out revision 5586.


Step 26:

Kerrighed uses linux 2.6.0. Kerrighed ignores any other version.

wget -O /usr/src/linux-2.6.20.tar.bz2 && tar jxf /usr/src/linux-2.6.20.tar.bz2 && cd /usr/src/kerrighed && ./ && ./configure && cd kernel && make defconfig


Step 27:

Make sure these settings are in place. By default, b), c), d) are

enabled but it wouldn’t hurt if you double-check. a) you have to pick

the network cards of your nodes and make sure they are loadable at boot

time (* not M)

a. Device Drivers -> Network device support –> Ethernet (10 or 100Mbit)

b. File systems -> Network File Systems and enabling NFS file system support,

NFS server support, and Root file system on NFS. Make sure that the NFSv3 options

are also enabled, and again, make sure they are part of the kernel and not loadable

modules (asterisks and not Ms).

c. To enable the scheduler framework, select “Cluster support” –> “Kerrighed

support for global scheduling” –> “Run-time configurable scheduler framework”

(CONFIG_KRG_SCHED_CONFIG). You should also enable the “Compile components needed

to emulate the old hard-coded scheduler” option to mimic the legacy scheduler

(CONFIG_KRG_SCHED_COMPAT). This last option will compile scheduler components

(kernel modules) together with the main Kerrighed module, that can be used to

rebuild the legacy scheduler, as shown below.

d. To let the scheduler framework automatically load components’ modules,

select “Loadable module support” –> “Automatic kernel module loading”

(CONFIG_KMOD). Otherwise, components’ modules must be manually loaded

on each node before components that they provide can be configured.


make menuconfig


Step 28:

Kernel compilation with Kerrighed support

cd .. && make kernel && make && make kernel-install && make install && ldconfig


Step 29:

Configuring Kerrighed

nano /etc/kerrighed_nodes

session=1 #Value can be 1 – 254

nbmin=2 #2 nodes starting up with the Kerrighed kernel.

nano /etc/default/kerrighed

# Start kerrighed cluster



# Enable/Disable legacy scheduler behaviour




Step 30:

Exit chrooted system



Step 31:

Out of your chrooted system copy bootable kernel.

cp -p /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /var/lib/tftpboot/


Step 32:

Configure the controller to use eth0 card.

eth0 will be used by the DHCP server to feed the nodes.

ifconfig eth0

/etc/init.d/tftpd-hpa start

/etc/init.d/dhcp3-server start

/etc/init.d/portmap start

/etc/init.d/nfs-kernel-server start


Step 33:

Make sure nodes are connected to the router.

From the controller do:

ssh clusteruser@

Then from any connected node as “clusteruser”:

krgadm nodes




Double-check as root from the node:

tail -f /var/log/messages

node1 kernel: Proc initialisation: done

node1 kernel: EPM initialisation: start

node1 kernel: EPM initialisation: done

node1 kernel: Init Kerrighed distributed services: done

node1 kernel: scheduler initialization succeeded!

node1 kernel: Kerrighed… loaded!

These commands are helpful. Do these as a regular node user “clusteruser”.

krgcapset -d +CAN_MIGRATE

krgcapset -k $$ -d +CAN_MIGRATE

krgcapset -d +USE_REMOTE_MEMORY

krgcapset -k $$ --inheritable-effective +CAN_MIGRATE

To monitor your cluster:


(toggle 1 to see cpus)


cat /proc/cpuinfo | grep “model name”

cat /proc/meminfo | grep “MemFree”

cat /proc/stat


Step 34:

This is step is needed so you do not have to enter a password when triggering your MPI programs from the node.

If you do not generate a key, you will have to enter the node[n] password manually in order to migrate the processes.

You may not need to enter a password when generating the key. The assumption is that the controller is secure enough from the outside (no rerouting packets from eth1 –the other network card.)

Alternatively, if you feel paranoid you may enter a password then tell ssh-agent to remember it. The password will remembered for that session only.

After you log on to one of the nodes via ssh

ssh-keygen -t dsa (don't enter password)

cp /home/clusteruser/.ssh/ /home/clusteruser/.ssh/authorized_keys


ssh-keygen -t dsa (do enter password)

cp /home/clusteruser/.ssh/ /home/clusteruser/.ssh/authorized_keys

eval `ssh-agent`

ssh-add /home/clusteruser/.ssh/id_dsa (type in password associated with keys)


Step 35 TESTING:

A simple ‘hello world’ programs that calls the MPI library.

I will create a config file where MPI can lookup information for running jobs on the cluster.

I am creating this config file on the home directory of the cluster user “clusteruser” –which is the same account we created earlier. It will be readable to the node so you can create the file as your own user from the controller. You can also log on to the any of the nodes where you will be triggering your programs from and create the file there using the “clusteruser” account:

In this situation, I opted for Door A

at controller as a regular user –your regular system username:

nano /nfsroot/kerrighed/home/clusteruser/mpi_file.conf

#Contents of mpi_file.conf. I’m listing the nodes of the cluster.





hello world

This “hello world” program does not deviate much from any other hello world program you have seen before. The only difference is that it has MPI calls.





int main(int argc, char *argv[])


char *boxname;

int rank, processes;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &processes);

boxname = (char *)calloc(100,sizeof(char));


printf(“\n\nProcess: %i\nMessage: hello cruel world!\nCluster Node: %s \n\n”, rank, boxname, processes);




——–END CODE———

On the controller compile your program using the MPI library:

mpicc hello_world.c -o hello_world

Put the MPI program in the user’s home directory on one of the nodes.

In this example, I put it in /nfsroot/kerrighed/home/clusteruser:

cp hello_world /nfsroot/kerrighed/home/clusteruser/

open another shell and ssh any of the nodes. Here I log on to node1:

ssh clusteruser@

mpirun -np 2 --hostfile mpi_file.conf hello_world


Process: 1

Message: hello cruel world!

Cluster Node: node2

Process: 0

Message: hello cruel world!

Cluster Node: node1




“PXE-E32: TFTP open timeout” error. It can be either that your network card is not supported or that you have something blocking the way for the TFTP server to distribute the image.

Try booting your node from CD:

cd /tmp


bunzip2 gpxe-1.0.0.tar.bz2

tar xvpf gpxe-1.0.0.tar

cd /tmp/gpxe-1.0.0/src/bin/gpxe.iso

make bin/gpxe.iso

Then burn gpxe.iso to a CD and boot the client off of it.

If still no joy try below. It might that something is blocking the way to the TFTP server.

On the controller:

in.tftpd -l

tail -1 /var/log/syslog

recvfrom: Socket operation on non-socket

cannot bind to local socket: Address already in use

solution: you can use the package rcconf to disable dhcp, portmap, nfs server, and tftp-hpa at boot time. Then start manually each server when needed.

If problem persists try disabling firewall settings

(make a backup of existing rules iptables-save > /root/firewall.rules)

iptables -X

iptables -t nat -F

iptables -t nat -X

iptables -t mangle -F

iptables -t mangle -X

iptables -P INPUT ACCEPT



[ to restore after you find out what the problem is use iptables-restore < /root/firewall.rules]

You can also try this:

netstat -anp | grep 69

udp6 0 0 :::69 :::*

note: this output looks suspicious “udp6??

Connect with any TFTP client from the controller and on a second shell do tail -f /var/log/syslog


tftp> get pxelinux.0

Transfer timed out

tail -f /var/log/syslog

in.tftpd[2881]: received address was not AF_INET, please check your inetd config

inetd[2441]: /usr/sbin/in.tftpd: exit status 0x4c00

note: Check inet.conf file and disable IPv6

To disable IPv6 add these lines to /etc/modprobe.d/aliases

alias net-pf-10 off

alias ipv6 off

Also in /etc/hosts put a comment on these lines:

#::1 localhost ip6-localhost ip6-loopback

#fe00::0 ip6-localnet

#ff00::0 ip6-mcastprefix

#ff02::1 ip6-allnodes

#ff02::2 ip6-allrouters

#ff02::3 ip6-allhosts

Reboot and try again from head node.


tftp> get pxelinux.0

Received 15987 bytes in 0.0 seconds

All ok now, try booting your nodes.