Why can’t my job run now?
Once you submit your job to LSF using bsub it enters the pending sate. You can see all pending jobs with
bjobs -p -uall
See status of a particular job with its JobID
bjobs -l <jobid>
Check “PENDING REASONS” in the output
For example:
PENDING REASONS:
Job dependency condition not satisfied;
Jobs which request resources that the cluster can not fulfill will remain in the pending state indefinitely.
These are the maximum resources available:
Max Resource | Lilac | Juno |
---|---|---|
slots/node | 80 | 80 |
mem | 750 | 750 |
GPUs | 4 | 8 |
WallTime | 7days | 31days* |
*Jobs that are less than 6 hours can run on any node. But those longer than 6 hours can only run on nodes with SLA or on a subset of the shared nodes.
Things to Check
The more resources that you request, the longer it will take for the LSF accumulate the resources to satisfy them.
When will my job start to run?
bjobs -l <jobid>
Check “ESTIMATION” in the output
Details forthcoming
Why did my job exit abnormally?
bhist -l JID
bhist -n 0 -l JID
Details forthcoming