Why can’t my job run now?

Once you submit your job to LSF using bsub,  it enters the PENDING sate. You can see all pending jobs with

bjobs -p -uall


You can see the status of a particular job with its JobID. Look for PENDING REASONS: in the output.

bjobs -l <jobid>

....

PENDING REASONS:

Job dependency condition not satisfied;

.....

It can be difficult to interpret the PENDING REASONS that LSF in the bjobs output.  The cluster may just be very busy. You can see the cluster activity at https://hpc-grafana.mskcc.org/

Other LSF commands such as bhosts, lshost and lshost -gpu will give you current information about the available nodes and resources on the command line. You can also use RTM to view LSF details as the guest user at  http://lila-rtm01.mskcc.orgg/cacti/index.php and http://juno-rtm01.mskcc.org/cacti/index.php

Things to Check: 

The more resources that you request, the longer it will take for the LSF accumulate the resources to satisfy them.  Jobs which request resources that the cluster does not possess will remain in the pending state indefinitely.  These are the maximum resources available:

Max ResourceLilacJuno
slots/node8080
total memory750750
GPUs48
WallTime7days31days*

*Jobs that are less than 6 hours can run on any node. But those longer than 6 hours can only run on nodes with SLA or on a subset of the shared nodes.


More examples forthcoming


When will my job start to run?

bjobs -l <jobid>

Check “ESTIMATION” in the output

Details forthcoming


Why did my job exit abnormally?

bhist -l JID

bhist -n 0 -l JID

Details forthcoming