KTH PDC [PDC - Center for Parallel Computers, KTH]
[Entrance to PDC]
[Information about PDC]
[News & Events]
[Computer resources]
*[User support]
[Training & Courses]
[Search the webmap]
[Links to far away]

Guided Tours: EASY SP

In this section you will find information on how to start using the EASY SP Scheduler on the IBM SP system. (Most information, but not all, is valid on the other systems as well. The EASY scheduler on Lucidor is described on this page.)

Contents


PDC EASY - an abstract description


The EASY SP Scheduler is a system that originates from Argonne National Laboratory, USA. It has been modified here at PDC to better suit our needs. Most of these additions have been done to make the scheduler work within the framework of our particular SP-2 configuration and also to take into account that we are using AFS and Kerberos. These changes should be largely transparent to the user, but to be able to cope with an inhomogeneous machine resource such as the SP-2 Strindberg, some additional flags have been added to some of the commands. This document is intended as a supplement to the EASY User's Guide (260kB compressed PostScript) from Argonne and covers the PDC local modifications.

This document follows as close as possible the printed section numbering of the EASY User's Guide found at Argonne. Hence the somewhat odd numbering in the table of contents of the printed version.


PDC EASY - Rules


A job is started during one of the following periods ``Day'', ``Night'', ``NightWE'' or it is "Running". The period given to a job is decided on the basis of what kind of resources it requires.

However, please note these limits are subject of operator control.

You can list the period limits using the command "spq -L".

Easy has a cycle time of a week. During the cycle each period is scanned at least once. Time limits are with respect to GMT, i.e., 7AM corresponds to eight o'clock if it's winter or nine o'clock in the summer if you are in Sweden.

Jobs are only started if they are able to complete within the period of time. Please note that if you request a time that is on the minute within the limit of a period, easy will not start your job if it is delayed a few seconds!

Jobs will only be able to write their result to disk if they have valid Kerberos tickets when the write operation occurs. Normally a ticket expires after 10 hours, for jobs that run longer than that, tickets with longer lifetimes must be obtained. See the Kerberos Guided Tour for details.

For interactive development we recommend the interactive nodes, see below.


PDC EASY - Commands


The following parts of the ANL Easy command set are supported at PDC. You get access to them by typing ``module add sp easy local'' which you anyhow always ought to do.

  • getjid
  • spfree
  • sphelp
  • spq
  • sprelease
  • spstatus
  • spsubmit
  • spusage
  • spwhen
  • spusage
  • xspusage

The following commands are extensions to the ANL Easy command set and are described below in this text.

  • cac
  • spattach
  • spsummary
  • spjobsummary
  • spjobstatistic

The following commands of the ANL Easy command set are not available at PDC:

  • sppause
  • spwait.


PDC EASY - Extensions


  • spattach


    Spattach aims at interactive users who have MPL, MPI, or HPF similar codes. However there is nothing that forbids a batch or PVM user to make use of it. You run your parallel or serial program either as a command or in a sub-shell when using spattach.

    It all is very similar to run an ordinary Unix program.

    Spattach exports information such as number of nodes and name of nodes to its sub-shell or sub-program. This makes it quite convenient to use.

    Spattach is completely silent unless you tell it otherwise. Among the switches are help, number of nodes, how long to run, if to send mail, how verbose to be and a few others.

    By adding the option `-i' you attach to the interactive pool. Do not ask for more interactive nodes than the ones which already have been set up. Otherwise you might have to wait for quite some time.

    Be a good neighbor, don't use the interactive nodes in exclusive mode. You will have to share them with others.

  • I want a mix of nodes


    Since there are different brands of nodes, it is possible to request a certain mix of nodes for your job. This apply to both spsubmit and spattach.

    This might be useful if you have a code working in a master/slave fashion and need more memory or larger disks on one of your nodes.

    # I need one W node and eight T nodes
    strindberg%
    strindberg% spsubmit -p 1W8T -t 60 -j mpi ./a.out
    strindberg%
        

    The example will give you 9 nodes, with the `W' node being first. Specifying -p1W1T1Z1T will give you four nodes, the first being of brand W, the second a T, a Z and a T again.


PDC Supplement to ANL EASY


  • spsummary


    Spsummary displays a short user summary, which includes number of jobs started, dequeued jobs and aggregated allocation time in minutes.
  • spwhen


    Spwhen gives an estimate of when a job might start. The estimate is based upon the current situation: How many nodes are considered up and running, the end--time of the jobs currently running and how the jobs waiting in line looks like.

    The results of spwhen changes if a job that is ahead in line terminates sooner than expected. Consider spwhen to show a current best guess.

    If there, for example, are three jobs in the machine: One small currently running, the second being a big one and the third another small one being possible to back--fill. Spwhen might evaluate that the third small one will be able to start immediately.

    However, the job currently running, terminates sooner than expected. It's now the big, as in many nodes, jobs turn. Not the third one, which is not being backfilled.

    In other words, spwhen is fragile.

  • spwhen applet


    The Spwhen applet is a Java applet you run from your browser. It gives you a graphical representation of the job queue which you can use to get an estimate of when your job will start. To run the applet and for further details on how to use it, see the spwhen applet page.

  • spjobstatistic


    Spjobstatistic display statistics about a job. The statistics include CPU utilization, memory usage, process historic, disk usage, network usage which all are collected during execution of the job.

    The statistics should, as all statistics, be taken with a grain of salt. They are meant as a tool for locating errors and improving efficiency, not to evaluate different program against each other.

    Spjobstatistics have three levels of verbosity.

    The first, when given only a job id, will display a overview of all aspects of the job.

    strindberg% spjobstatistic 1998120106311635
    ...
    

    The second, when given a job id and a category (`cpu', `net', `unix' or `disk'), will display a overview of all aspects of the given category.

    strindberg% spjobstatistic 1998113010004999 cpu net
    ...
    

    The third, when given a job id, a nodename and a category, will display detailed statistics for category on the node indicated.

    strindberg% spjobstatistic 1998112808473926 r01n05 cpu
    ...
    

Environment variables


The following variables can be used within a batch-script submitted using spsubmit:

SP_JID		Job ID given by easy.
SP_EASY_HOME    Home directory of easy.
SP_ARGS		From spsubmits ``Command Line Arguments''.
SP_INITIALDIR	From spsubmits ``Initial directory''.
SP_SUBMIT_HOST  The node from which the job was submitted.
SP_PROCS	Number of allocated nodes.
SP_NODES	Allocated nodes
SP_HOSTFILE	The file that contains all allocated
                host names. There is one allocated host
                on each row.