Page History

Versions Compared

Old Version 1

changes.mady.by.user Joanne Edington

Saved on May 12, 2021

compared with

New Version 2

changes.mady.by.user Joanne Edington

Saved on May 12, 2021

Key

This line was added.
This line was removed.
Formatting was changed.

...

Once you submit your job to LSF using bsub bsub, it enters the pending PENDING sate. You can see all pending jobs with

bjobs -p -uall

See You can see the status of a particular job with its JobID. Look for PENDING REASONS: in the output.

bjobs -l <jobid>

Check “PENDING REASONS” in the output

For example:

....

PENDING REASONS:

Job dependency condition not satisfied;

Jobs which request resources that the cluster can not fulfill will remain in the pending state indefinitely.

These are the maximum resources available:

...

*Jobs that are less than 6 hours can run on any node. But those longer than 6 hours can only run on nodes with SLA or on a subset of the shared nodes.

Things to Check

.....

It can be difficult to interpret the PENDING REASONS that LSF in the bjobs output. The cluster may just be very busy. You can see the cluster activity at https://hpc-grafana.mskcc.org/

Other LSF commands such as bhosts, lshost and lshost -gpu will give you current information about the available nodes and resources on the command line. You can also use RTM to view LSF details as the guest user at http://lila-rtm01.mskcc.org g/cacti/index.php and http://juno-rtm01.mskcc.org/cacti/index.php

Things to Check:

Typos in your bsub command.
Any requested GPU models are correct. lshost -gpu will list them with the correct syntax.
The requested memory requirement (-R rusage[mem=4]) is in GB (gigabytes) and is PER SLOT (-n) and not per job.
Make sure that you are in the SLA (Service Level Agreement) for any nodes that you specifically request.
Your job must be able to finish before any scheduled downtime reservation.

The more resources that you request, the longer it will take for the LSF accumulate the resources to satisfy them. Jobs which request resources that the cluster does not possess will remain in the pending state indefinitely. These are the maximum resources available:

Max Resource	Lilac	Juno
slots/node	80	80
total memory	750	750
GPUs	4	8
WallTime	7days	31days*

*Jobs that are less than 6 hours can run on any node. But those longer than 6 hours can only run on nodes with SLA or on a subset of the shared nodes.

More examples forthcoming

When will my job start to run?

...

Content

Space Tools

General Documentation

LSF Primer

Lilac Cluster Guide

Juno Cluster Guide

Cloud Resources

Backup Policy on server/node local drives

File lists

Versions Compared

Old Version 1

New Version 2

Key