Friday, January 2, 2009

Using pelicanHPC to create Centos Beowulf Cluster

Using pelicanHPC to create Centos Beowulf Cluster

*** Temporary notes until we update it again ***

Blog about this
http://blog.harisfazillah.info/2009/01/beowulf-high-performance-parallel.html

We want to create a Beowulf Cluster for Centos by using PelicanHPC scripts.

http://pareto.uab.es/mcreel/PelicanHPC

MPI Toolbox http://atc.ugr.es/~javier/mpitb.html

First step is to install Centos with basic setup. Minimun aplications.

This as the max of the range of IP around 2

Master Node.

Network IP range : 10.11.12

(1) Setup a DHCP Server

yum install dhcp

---

edit /etc/dhcpd.conf

# global settings
allow booting;
allow bootp;
default-lease-time 600;
max-lease-time 7200;
subnet 10.11.12.0 netmask 255.255.255.0 {
next-server 10.11.12.1;
filename "pxelinux.0";
option subnet-mask 255.255.255.0;
range 10.11.12.10 10.11.12.20;
}

---

chkconfig dhcpd on
service start dhcpd


(2) Configure atftp server

download from

http://dag.wieers.com/rpm/packages/atftp

Configure it

edit /etc/xinetd.d/tftp

service tftp
{
disable = no
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.tftpd
server_args = /tftpboot
per_source = 11
cps = 100 2
flags = IPv4
}

---

service xinetd start

(3) NFS setup

yum install nfs-utils

create directory /live/image

edit /etc/exports

--

/live/image *(ro,async,no_subtree_check,no_root_squash,fsid=12345)
/home 10.11.12.0/255.255.255.0(rw,root_squash,async,no_subtree_check)

(4) create user name user

groupadd -g 1000 user

useradd -u 1000 -g user user

passwd user

--- Create this shell script under /usr/local/bin with name setup-cluster-bin.sh

#!/bin/bash
# Copyright 2007, 2008 Michael Creel 
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
# set this to the network you'd like to use for the cluster

# Base on make_pelican version 1.7.1
# See http://pareto.uab.es/mcreel/PelicanHPC for more information.

cd /home/user
HOME="/home/user"
PKTMP="$HOME/tmp"
echo "Creating temporary directory"
rm -R -f $PKTMP
mkdir $PKTMP
chown user.user $PKTMP
chmod 777 $PKTMP
# regenerate keys
echo "Generating new RSA keys"
rm -f $HOME/.ssh/id_rsa*
ssh-keygen -q -t rsa -N "" -f "$HOME/.ssh/id_rsa"
cp $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys
chmod 600 $HOME/.ssh/authorized_keys
# make list of hosts to fping
echo "10.11.12.2" > $HOME/fpinghosts
i=2
while [ $i -lt 254 ]
do
i=`expr $i + 1`
echo 10.11.12.$i >> $HOME/fpinghosts
done
exit

*** copy /home/user from pelicanHPC

tar cjf

(5) Install fping

http://dag.wieers.com/rpm/packages/fping

(6) dialog

yum install dialog

(7) set fix ip 10.11.12.1 and include in /etc/hosts



(6) We need to copy from PelicanHPC cd /live and /var/lib/tftpboot


(7) edit /etc/ssh/ssh_config (ssh client)

StrictHostKeyChecking no

(8) set sudo for user

user ALL = (ALL) NOPASSWD: /usr/sbin/fping

(9) put this script inside /usr/local/bin

#!/bin/sh
# Copyright 2007, 2008 Michael Creel 
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
# set this to the network you'd like to use for the cluster

# Base on make_pelican version 1.7.1
# See http://pareto.uab.es/mcreel/PelicanHPC for more information.


PKTMP="/home/user/tmp"
PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/lib/lam/bin"
export PATH
DIALOG="dialog"
bailout(){
exit 0
}
# check which nodes are up
checknodes(){
rm $PKTMP/bhosts
sudo /usr/sbin/fping -a -q -f /home/user/fpinghosts > $PKTMP/bhosts
}
retry(){
checknodes
NNODES="$(grep -c "" $PKTMP/bhosts)"
MESSAGENODES="\nGo turn on your compute nodes now. \n\nAt the moment $NNODES compute nodes (not counting this frontend node) are available. \n\nClick no to rescan the available nodes. Click yes when the desired number of nodes are available. You might want to wait a bit if some nodes are still finishing booting up."
$DIALOG --title "$TITLE" --defaultno --yesno "$MESSAGENODES" 20 50 || retry
}
trap bailout 1 2 3 15
TITLE="Centos HPC"
MESSAGE="\nWe now set up the cluster by finding which nodes are available. If you are doing initial setup, click on yes. If you are resizing a running cluster, be aware that continuing will interrupt any running MPI jobs. Click no abort resizing."
$DIALOG --title "$TITLE" --yesno "$MESSAGE" 15 50 || bailout
retry
# master must be last in the list
echo "10.11.12.1" | cat >> $PKTMP/bhosts
# lamboot 2X to generate known_hosts (and an error message) the first time
lamwipe
lamboot $PKTMP/bhosts
lamwipe
lamboot $PKTMP/bhosts
lamnodes
sleep 5
# display success message
NNODES="$(grep -c "" $PKTMP/bhosts)"
# final report
SUCCESS="\nYour cluster of $NNODES nodes is (probably) lambooted. If there was a problem, just re-run the script.\nThe nodes in the cluster are listed in the file ~/tmp/bhosts. If you add or remove compute nodes, re-run this script (/usr/local/bin/monitor_hpc.sh) whenever you like."
$DIALOG --title "$TITLE" --msgbox "$SUCCESS" 15 50
bailout