More Information On Lucidor
Last change at GMT $Date: 2006/04/13 15:40:00 $.
News And Links
See the flash news for any information on planned and unplanned system down time.
The cluster at large is named Lucidor. To get access, you log in to blumino.pdc.kth.se one of several signatures of Lucidor. This will bring you to the login node.
FAQThe FAQ has been moved to the new web.
Logging On To Lucidor
The log in node for users is called blumino.pdc.kth.se. You need Kerberos version 5 (heimdal) to get proper access. We strongly suggest you to obtain and install Kerberos 5, or get a travelkit found at www.pdc.kth.se. In case there is no travelkit for your kind of operating system, please do inform us.
After you logon, make sure you get/have tickets.
Most software is not automatically available, but is found through the module system. PATH and other environment variables are modified through the module command. Use module avail to list all available modules and module show [modulename] to see what a module does to your login environment. Use module list to see which modules you have added.
To get access to the Intel 64-bit compilers ifort (Fortran 90) and icc do:
$ module add i-compilers $ man ifort [..] $
You should have valid tickets to use the Intel compilers. Recommended optimization option is -03 for Fortran programs and -02 for C programs. You may also need -fno-alias for C programs.
Several versions of gcc are also available.
There are several issues regarding linking. Dynamic linking is the default, using shared libraries. There also is static linking. Both have pro's and con's. A statically linked executable will be larger, always executing exactly the same code, while dynamic linking will generate a smaller executable that could incorporate library modifications made later. Choose by cc -static or cc -shared, where, as said, the shared is default.
To find out the path of libraries you could, i.e., for the Myrinet and arpack libraries type
$ module whatis gm arpack gm : myrinet gm device-drivers, utilities &c gm : Typical linking options: LDFLAGS="-L/pdc/vol/gm/2.0.1/lib -lgm" arpack : ARPACK and PARPACK restarted Arnoldi method eigensolver arpack : Typical use when compiling: arpack : LDFLAGS="-L/pdc/vol/arpack/2003-05-22/lib -larpack -lmkl_lapack -lmkl_i2p -lguide" $
To find out what shared libraries your shared executable depend on, type
$ ldd ./a.out libgm.so.0 => /pdc/vol/gm/2.0.1/lib/libgm.so.0 (0x2000000000044000) libpthread.so.0 => /lib/libpthread.so.0 (0x200000000007c000) libc.so.6.1 => /lib/libc.so.6.1 (0x20000000000b0000) /lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000) $
If staff improves/breaks the libgm above, a.out will use the improved/broken libgm. In case a.out is copied to a system (node) where /pdc/vol/gm/2.0.1/lib isn't available, or cannot be found, the a.out will not execute. A statically linked a.out would not depend on any shared libraries. All nodes within the cluster should be identical in almost any aspect, though, especially with respect to shared libraries.
The Easy Scheduler
The log in nodes are intended for editing and compiling, All other use should be carried out on nodes under batch-system control. Short jobs may be executed on the interactive nodes. Reserving dedicated nodes through the batch-system gives you exclusive dedicated access to the requested resource. To make a reservation for one node for 300 minutes you could:
$ module add easy $ esubmit -n 1 -t 300
Once the reservation is effective you can log in to the reserved node, or use it by any distributed mean. Use spq and spusage to see what node that was reserved. spstatus will display a system summary.
You can also submit a single job with
$ esubmit -n1 -t 60 ./my_program
Use esubmit -hv to list options, i.e. for how to make reservations at a specific time-of-day, or how to chain reservations.
Note that files under /scratch/ are removed when the node is released.
To see queue limits use spq -l and spq -L. Use spq -w for wide output, i.e. to see more information. Adding a -s option to spstatus, spstatus -s, to see reservations of certain runtimes for the near future.
- All nodes are available for jobs <= 4 wall-clock-hours, any day.
- Some nodes are available for jobs <= 15 wall-clock-hours, weekdays.
- All nodes are available for jobs <= 15 wall-clock-hours, weekday nights.
- All nodes are available for jobs <= 60 hours during weekends.
- Less than a handful of nodes are available for jobs <= 60 hours all days.
Other EASY commands include spwhen, sprelease, spsummary and spjobsummary. More information on the EASY scheduler can be found on the Guided Tours: EASY SP page. Note that there are some differences between the SP version and the Lucidor version of EASY. For instance that the submit command is called esubmit on Lucidor. Almost all EASY commands have a -h option for further help. In some cases -h -v extends the information.
There are two types of nodes, A-nodes and B-nodes. A-nodes are so-called graphic nodes. They have a slightly different memory configuration (although they have the same amount of memory as the others) which may translate into a slight performance penalty for some codes. You can make a reservation to any kind, and any number of nodes, however, if you ask for resources not available, the request will not be met.
A common pitfall is that a modified .basrc/.cshrc/.profile/.tcshrc/&c prints some kind of error, or warning, output when performing a rsh. There should be no extra output when performing a rsh. You could verify your modifications through:
$ rsh blumino date ; date Mon Nov 3 17:07:23 CET 2003 Mon Nov 3 17:07:23 CET 2003 $No output between the two dates.
There are considerations to make regarding static or shared linking for successful execution of parallel codes. Consult PDC staff is you have problems with this issue.
ifort and icc have the compiler option -openmp.
There are two CPUs per node. Use the environment variable
OMP_NUM_THREADS to determine the number of threads.
By default the MKL libraries will run single-threaded. The libraries
are thread safe and can be called simultaneously from more than one
thread. If the environment variable
set to a number greater than 1, the library may attempt to internally
thread onto that number of processors. In case of an OpenMP
application program this might not be the optimal behavior. To enforce
the MKL library to run each invocation serially set the variable
MKL_SERIAL to "yes".
LAPACK / BLASSee the MKL section on the Software on Lucidor page.
Access to utility functions for Fortran such as getarg is achieved by linking with -Vaxlib. Many other non-standard Fortran support functions such as timef, gettimeofday and so on are available in Vaxlib.
For reasonable high resolution CPU time measurement we recommend the Fortran standard CPU_TIME call. This call has higher resolution that Linux standard getrusage based solutions. For wall clock timings, the routine wtime.f can be used.
Binary Data FormatsAs the Intel line of CPUs have a different endian-ness than traditional Unix systems you might get some help through the guided tour about binary files. Note the link to the Intel fortran compiler.
Trace toolsA trace plot can be created using the MPE tool jumpshot. To use jumpshot, link with -mpilog. jumpshot wants the logfile in slog format. If you get a logfile in clog format, you can use the clog2slog program. (Available after module add mpich.) jumpshot can be found in /pdc/vol/mpich/18.104.22.168/gm-2.0.5/intel/share/jumpshot-3/bin/. Start with jumpshot <my_prog>.slog. Push the Display button.
In order to get the logfile in slog format, you need to set
the environment variable MPE_LOG_FORMAT to SLOG in a
suitable .*rc file, for instance:
export MPE_LOG_FORMAT=SLOG in Public/.bashrcor
setenv MPE_LOG_FORMAT SLOG in Public/.cshrcValidate that it works with the command:
rsh lucidor printenv MPE_LOG_FORMAT
See also the perftools section on the Software on Lucidor page.
Sample SessionSee the guided tour page for sample sessions.
The routine MPI_DIMS_CREATE fails if the module mpich/22.214.171.124-intel is used.
Bug in top. The memory usage per process reported in top is incorrect for codes using more than 4 GBytes of memory. ps v displays the correct value. This has been fixed, 2003-09-16.
You may get an error message containing this sentence:
You can increase the amount of memory by setting the environment variable P4_GLOBMEMSIZE (in bytes); the current size is 4194304This means that internal buffers in MPICH are too small. The solution is to increase the value of the environment variable P4_GLOBMEMSIZE. This must be done by setting it in an rc file, i.e. .cshrc if you are using csh.
One message that could look alarming, but is more of informational character, is:
Sep 6 11:56:16 h06n05 kernel: a.out(23284): floating-point assist fault at ip 40000000010a0b91This message is to inform you that a floating point operation with an IEEE denormalized number was made through software emulation, which is much slower than though hardware. Typically this happens when a number gets so close to zero that the exponent cannot get small enough. See i.e. those floating-point assist fault messages. Note that the intel-compilers have this flag named '-ftz' .
A message of similar kind - you don't need to alarmed, but might want to take action to improve performance, is:
Sep 5 16:43:41 h06n05 kernel: b.out(14298): unaligned access to 0x60000fffffffb76c, ip=0x200000000018ab50which means an unaligned access had to be emulated in software, correct but inefficient. Read about the unaligned access messages. You could also find a few other pieces of information at technical tips.
There is the prctl command that could help change the behavior when having code hitting any of the two above messages, see man prctl. This should be considered a workaround.
On occasion it has happened that a job has problems to execute, with messages similar to:
 Error: Unable to translate GM global node id (3712585607)to local node id for the MPI id 15 !  Error: Unable to translate GM global node id (3712587729)to local node id for the MPI id 8 !  Error: Unable to translate GM global node id (3712585607)to local node id for the MPI id 15Errors similar to this should immediately be reported to
For assistance e-mail
email@example.com. If you
experience any problem do not hesitate to contact us immediately.
pdc-staff, $Date: 2006/04/13 15:40:00 $