LSF scheduler information specific to the luna cluster
By default all jobs are scheduled on the sol
queue. Jobs are scheduled based on their resource requests, estimated runtime, and user priorities. You can also specify the test
queue, at your own risk.
LSF host groups:
- largeHG: t01-t02
- internetHG: s01-s02
commonHG
:s03-s33
&u01-u26
- testHG: u34-u35
LSF host partitions
largeHP
:t01-t02
commonHP
:s03-s33
& u01-u26testHP
:u34-u35
Long vs. Short Jobs
- If a job specifies estimated run time,
-We
, or hard run time,-W
, of 60 [minutes] or less, it is considered ‘short’. - Short jobs can run on all hosts.
- To run a short job you must specify
-We HOURS:MINUTES
or -W HOURS:MINUTES - All jobs are that do not request a run time of less than 60 minutes are considered ‘long’.
- Long jobs can run on 30% of commonHG hosts, and on 50% of
largeHG
hosts if they request enough memory to be eligible.
Large memory hosts
largeHG
hosts are for jobs that need a lot of memory.- Short jobs, and long jobs that request more than 376gb of memory, can run on largeHG hosts.
- There is no
internet
access onlargeHG
hosts. - Request the total amount of memory your job will use, not the fraction per job slot or thread.
Internet
hosts
internetHG
hosts are for jobs that needinternet
access.- Short jobs, as well as long jobs that request
internet
, can run oninternetHG
hosts. - To request an internet host use
-R "select[internet]"
.
Service Level Agreement Guarantees
- If you belong to an SLA group, you can specify it in your
bsub
command, as inbsub -sla Pipeline
, to be guaranteed a certain amount of the cluster (so long as that portion isn’t in use by anyone else in the same SLA group.) - All SLAs have a loan policy, which lets anyone’s short jobs use idle hosts.
- Guaranteed resources are assigned by full hosts, not CPUs/slots. If a job with -sla Pipeline is using 1 CPU/slot on a host, the
entire
host is reserved for Pipeline until Pipeline reaches its SLA. - Current SLA breakdown on
luna
:Pipeline
gets 40% of commonHG, 50%internetHG
, and 50%largeHG
.- Haystack gets 10% of CommonHG
Short
(short jobs auto attach to this) gets 20% of commonHG if there are no priority jobs.- The remaining 30% of commonHG and 50% of largeHG is unallocated and available for long jobs.
Current defaults for jobs:
- Soft memory limits -R “rusage[mem=GB] and Hard memory limits -M GB should be requested.
- If none are requested, the default for soft is 8 GB and for hard is 16 GB.
- If hard is requested but soft is not, soft = hard.
-R "span[hosts=1]"
Jobs that request multiple processors span a single host.-R "rusage[iounits=1]" The maximum iounits per host is 10. IOUNITS are an arbitrary measure of the amount of reading/writing that the job incurs.
- -We specifies expected or “soft” runtime. -W species the hard run time. Jobs are considered long if -We or -W is not specified. Anything less than 60 minutes is considered a ‘short’ job, which can run on all nodes.
- If soft runtime (-We hour:minute) is set and hard (-W hour:minute) is not, hard runtime = 2x soft runtime.
- If hard runtime is set, soft runtime does not need to be.
- Long jobs are restricted to 30% of commonHG, or 50% of largeHG if you request more than 376gb of memory.
- stdout normally goes to
-o file
. To redirect you must add quotes around the command to execute inside thebsub
command. For example:bsub -We 1 -J jobName -o output_file.txt "ls -al 1> redirect_file.txt"
bsub -w
is the wait option, as inbsub -w "post_done($PREV_JOBNAME)"
- Auto-emailing is turned off, but can be enabled in the
bsub
command.
Use post_done
to hold jobs, instead of done, which may start too quickly. If holding on multiple jobs with very similar names, -w “post_done($PREV_JOBNAME*)” should work, unless you have one. This will only let the job run if $PREV_JOBNAME job completed with exit status 0, and
completed its post_done
processes.
Examples
bsub sleep 30
This submits a basic sleep job (sleeps for 30 seconds)bsub -J jobname -We 0:30 -R "select[internet]" myjob
Submits job with job name “jobname” with an estimated runtime of 30 minutes, selecting for hosts with internet.bsub -m commonHG -R “rusage[mem=20]” myjob
Submits jobs only to hosts in host group commonHG, with 20GB mem requested
How to send an email at the end of a job:
First the user must `export LSB_JOB_REPORT_MAIL=Y` on the terminal that they are going to submit their job.
Then they use bsub -u <emailaddress@site.com> -N
The -N means email the job output file (people usually write it to a file using -o) at the end of the job. This is what the e-mail will look like.
Job was submitted from host by user in cluster . Job was executed on host(s) , in queue , as user in cluster . was used as the home directory. was used as the working directory. Started at Tue May 24 11:14:37 2016 Results reported on Tue May 24 11:14:50 2016 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input sleep 13 ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 0.07 sec. Total Requested Memory : - Delta Memory : - Run time : 13 sec. Turnaround time : 14 sec. The output (if any) follows: