SECS Compute Cluster
The SECS Compute Cluster is a CentOS-based ROCKS cluster suite running on Dell Blade Servers. These servers are connected to a 4gbps fiber link to a Dell AX4 San. There are currently 4 blades online, and 4 more are coming online in the summer of 2012. These blades have 16 GB of RAM each, with dual quad-core Xeon processors (the next 4 bought will be nehalem-based.)
The ROCKS cluster suite consists of several open-source cluster components such as OpenMPI and Sun's GridEngine. It is scalable, in that plugging in additional hardware just works, and any computer plugged into the cluster's switch will automatically become a cluster node.
The cluster uses the GFS filesystem (http://sources.redhat.com/cluster/gfs/) to share files between hosts in the cluster. The main reason we use GFS over protocols like NFS, is the locking mechanism supports concurrent write access by nodes in the cluster, so compute nodes can use files as data transport mechanisms if they wish.
Logging in
The cluster operates using a single head node to provide an interface to the user. The compute nodes aren't intended to be logged into directly.
To log into the cluster, use either ssh or the NX client to log into the server:
- head.clust.secs.oakland.edu
Shared Storage
The cluster uses a GFS filesystem to store your home directory, but your SECS home directory (the one that corresponds to your H: drive, and your home directory on login.secs.oakland.edu) is also available for access on the cluter head. The path to your SECS home directory is:
/SECSHomes/<x>/<username>
Where <x> is the first letter of your username, and <username> is your username itself. You can copy files to or from this directory, or you can create a symbolic link from your cluster home directory, like so:
This will create a symbolic link called "secshomedir" in your cluster home directory that links to your SECS home directory.
It is not a good idea to store things like MPI programs and project files in your SECS home directory, since it is only accessible from the head node, and not each of the compute nodes. That is, if you want a file to be seen by every node in the cluster, store it in your cluster home directory, not the SECS home directory.
MPI
The following languages are supported for MPI programs:
- C (mpicc)
- C++ (mpic++)
- Fortran 77 (mpif77)
- Fortran 90 (mpif90)
If you want to write an MPI program in Fortran 77, just compile it like you would a normal F77 application, but use "mpif77" to compile it instead of just "f77".
Here is a short tutorial on creating an MPI program in C that will run in parallel on each node in the cluster.
Some sample code is listed here: http://www.secs.oakland.edu/~simon23/mpi-ring.c
This code creates a simple ring structure on the nodes in the cluster. That is, each node has a "right" and "left" partner node. The 4th node in the cluster, for instance, talks to node 1 as its "right" node, and node 3 as its "left node. It then sends and receives a megabyte of data between each node in the cluster, then exits.
To compile the code, use the following:
- mpicc -o mpi-ring mpi-ring.c
This calls a specialized version of gcc called "mpicc", which will create and link your executable against the MPI libraries.
To run the program, use the following:
- mpirun -n 4 -machinefile /etc/machines ./mpi-ring
This should produce output looking something like this:
Process 1 on compute-0-1.local
Process 0 on compute-0-0.local
Process 2 on compute-0-2.local
Process 3 on compute-0-3.local
Process 3 on compute-0-3.local:successfully sent (1048576) bytes to id (0)
Process 1 on compute-0-1.local:successfully sent (1048576) bytes to id (2)
Process 2 on compute-0-2.local:successfully sent (1048576) bytes to id (3)
Process 3 on compute-0-3.local:successfully received (1048576) bytes from id (2)
Process 2 on compute-0-2.local:successfully received (1048576) bytes from id (1)
Process 1 on compute-0-1.local:successfully received (1048576) bytes from id (0)
Process 0 on compute-0-0.local:successfully sent (1048576) bytes to id (1)
Process 0 on compute-0-0.local:successfully received (1048576) bytes from id (3)
Sun Grid Engine
Sun's grid engine is also installed on the SECS cluster. It can be used as a method for invoking applications like Fluent and Comsol.
Here is a sample script for submission to sun's grid engine for use in the Fluent application:
#!/bin/bash
#$ -S /bin/bash
# Parallel Fluent Gridengine submit script
# Replace {...} by proper values
#$ -V
#$ -pe fluent_pe 16
#$ -cwd
#$ -o gridengine_output.txt -j y
#$ -e gridengine_error.txt
export FLUENT_ARCH=lnamd64
export FLUENT_INC=/gfs/software/fluent-6.3.26/Fluent.Inc
export PATH=$PATH:/gfs/software/fluent-6.3.26/Fluent.Inc/bin
export LM_LICENSE_FILE=27004@atlas.secs.oakland.edu
cd ~/parallel_process
rm gridengine_output.txt
rm gridengine_error.txt
. $FLUENT_INC/setup.sh
fluent 3d -sge -t$NSLOTS -g -pgmpi -sgepe fluent_pe $NSLOTS -i elbow4.dat >>output
The values in the comments section near the top of the script are variables passed to the Grid Engine environment. In this case, we are setting the parallel environment to fluent's parallel environment, and the number of parallel executions to 16 (that's 4 per cluster node, with 4 cluster nodes.)
The first lines in the script set up the needed variables for launching fluent, and the last line is the actual script itself, executed by sun's grid engine.
To launch the grid engine task, use the "qsub" command, providing the name of the script as an argument.
For additional tutorials in using Sun's grid engine, including submitting and managing jobs, visit the rocks website on sun's grid engine here:
The Machines File
Many cluster applications ask you for a machines file that contains the names of all the machines in the cluster. For your convenience, a file has been placed in /etc/machines that contains all the node names in the cluster.
Cluster Applications
The following applications are installed on the cluster and able to run in a parallel mode:
These and any future software we install will be placed in the /gfs/software directory on the cluster.