Clusters

Breadcrumbs

Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Requested CPUs are not available for requested walltime. Cluster is busy.

Expand
The job has been submitted to the ls03 host. ls03 doesn't have 30 free slots. bjobs -p3 -l 1135202 Job <1135202>, User <sveta>, Project <default>, Application <default>, Service Class <lgSC>, Status <PEND>, Queue <cpuqueue>, Job Priorit y <12>, Command <sleep 200000>, Share group charged </svet a>, Esub <memlimit> Fri May 14 15:32:55: Submitted from host <lilac-ln02>, CWD <$HOME>, 30 Task(s), Specified Hosts <ls03>; Fri May 14 15:32:55: Reserved <30> job slots on host(s) <30*ls03> for <30> tasks; Fri May 21 05:11:04: Job will start no sooner than indicated time stamp; PENDING REASONS: Candidate host pending reasons (1 of 120 hosts): Affinity resource requirement cannot be met because there are not enough processor units to satisfy the job affinity request: ls03; Non-candidate host pending reasons (119 of 120 hosts): Not specified in job submission: lx10, lx11, lx12, lx13, lx14, boson, lt01, lt02, lt03, lt04, lt05, lt06, lt07, lt08, lt09, ld01, ld02, , ld03 ld03, ld04, ld05, ld07, lg01, lg02, lg03, lg05, lg06, lt 100, lt11, lt12, lt13, lt14, lt15, lt17, lt18, lt19, lp01, lp03, lp05, lp06, lp07, ls01, lp35, ls05, ls06, ls07, lv01 , ls08, ls09, lt20, ly01, lt21, ly02, lt22, ly04, ly05, ly 06, ly07ly06, ly07, ly08, ly09, lp10, lp11, lp12, lp14, li01, lp16, lp17, lp18, ls11, ls12, lila-sched01, ls13, lila-sched02, ls14, ls15, ls16, ls17, ls18, lp20, lu01, lu02, lu03, lp23 , lu04, lp24, lu05, lp25, lu06, lu07, lp27, lx01, lu08, lu 09, lx03, lx04, lx05, lx07, lx08, lx09, lu10, lp30, lp31, lp33, lp34; Load information unavailable: lw01, lw02, ld06, lt16, lp04, lp09, ly03, ls10, lp19, lp26, lx06, lp32; Closed by LSF administrator: lg04, ls02, ls04, lx02;

Expand

The job has been submitted to the ls03 host. ls03 doesn't have 30 free slots.

bjobs -p3 -l 1135202

Job <1135202>, User <sveta>, Project <default>, Application <default>, Service

Class <lgSC>, Status <PEND>, Queue <cpuqueue>, Job Priorit

y <12>, Command <sleep 200000>, Share group charged </svet

a>, Esub <memlimit>

Fri May 14 15:32:55: Submitted from host <lilac-ln02>, CWD <$HOME>, 30 Task(s),

Specified Hosts <ls03>;

Fri May 14 15:32:55: Reserved <30> job slots on host(s) <30*ls03> for <30> tasks;

Fri May 21 05:11:04: Job will start no sooner than indicated time stamp;

PENDING REASONS:

Candidate host pending reasons (1 of 120 hosts):

Affinity resource requirement cannot be met because there are not enough processor units to satisfy the job affinity request: ls03;

Non-candidate host pending reasons (119 of 120 hosts):

Not specified in job submission: lx10, lx11, lx12, lx13, lx14, boson, lt01, lt02, lt03, lt04, lt05, lt06, lt07, lt08, lt09, ld01, ld02,

, ld03 ld03, ld04, ld05, ld07, lg01, lg02, lg03, lg05, lg06, lt 100, lt11, lt12, lt13, lt14, lt15, lt17, lt18, lt19, lp01,

lp03, lp05, lp06, lp07, ls01, lp35, ls05, ls06, ls07, lv01 , ls08, ls09, lt20, ly01, lt21, ly02, lt22, ly04, ly05, ly 06, ly07ly06, ly07, ly08, ly09, lp10, lp11, lp12, lp14, li01, lp16,

lp17, lp18, ls11, ls12, lila-sched01, ls13, lila-sched02, ls14, ls15, ls16, ls17, ls18, lp20, lu01, lu02, lu03, lp23 , lu04, lp24, lu05, lp25, lu06, lu07, lp27, lx01, lu08, lu 09, lx03, lx04, lx05, lx07, lx08, lx09, lu10, lp30, lp31,

lp33, lp34;

Load information unavailable: lw01, lw02, ld06, lt16, lp04, lp09, ly03, ls10, lp19, lp26, lx06, lp32;

Closed by LSF administrator: lg04, ls02, ls04, lx02;

Requested RAM(memory) is not available for requested walltime.
Expand

Requested RAM(memory) doesn't exist on cluster per host. The output from bjobs won't tell directly this reason.

Expand
>bjobs -p3 -l … Tue Sep 8 16:05:23: Submitted from host , CWD <$HOME>, 4 Task(s), Requested Resources <rusage[mem=200]>; PENDING REASONS: Candidate host pending reasons (99 of 123 hosts): Resource limit defined on host(s) and/or host group has been reached (Resource: mem, Limit Name: limit11, Limit Value: 95): lt15, lt17,lt18, lt19, lx14…. Job's requirements for resource reservation not satisfied (Resource: mem): l x10, lx12, lx13, boson, lt05, lt08, lt09, lx08, lx07…. Host is reserved to honor SLA guarantees: lp34, lp32, lp01, lp03, lp05, lp06….. Non-candidate host pending reasons (24 of 123 hosts): Not specified in job submission: li01, lv01... Load information unavailable: lp21, ls18, lp26, ls10, lp09, lp08, lg05, ld06 Closed by LSF administrator: lu04, lu05, ls05, lx09, lw02, lw01; MEMLIMIT 200 G RESOURCE REQUIREMENT DETAILS: Combined: select[(healthy=1) && (type == local)] order[!-slots:-maxslots] rusa ge[mem=200.00] span[hosts=1] same[model] affinity[thread(1 )*1]</rusage[mem=200]> This job won’t run on Lilac cluster, because it requested 4x200=800GB of RAM on the same host span[hosts=1] There is no host with 800GB on Lilac in cpuqueue. Please, check resources: >lshosts HOST_NAME type model cpuf ncpus maxmem maxswp ls01 X86_64 GTX1080 60.0 72 512G .. To check limits on resources: >bresource Begin Limit NAME = limit11 QUEUES = cpuqueue PER_HOST = ls-gpu/ lt-gpu/ lg-gpu/ lu-gpu/ lx-gpu/ lw-gpu/ ly-gpu/ SLOTS = 68 MEM = 95% ngpus_physical = 0 End Limit

Expand

>bjobs -p3 -l …

Tue Sep 8 16:05:23: Submitted from host ,

CWD <$HOME>, 4 Task(s), Requested Resources <rusage[mem=200]>;

PENDING REASONS: Candidate host pending reasons (99 of 123 hosts): Resource limit defined on host(s) and/or host group has been reached (Resource: mem, Limit Name: limit11, Limit Value: 95): lt15, lt17,lt18, lt19, lx14….

Job's requirements for resource reservation not satisfied (Resource: mem): l x10, lx12, lx13, boson, lt05, lt08, lt09, lx08, lx07…. Host is reserved to honor SLA guarantees: lp34, lp32, lp01, lp03, lp05, lp06…..

Non-candidate host pending reasons (24 of 123 hosts):

Not specified in job submission: li01, lv01...

Load information unavailable: lp21, ls18, lp26, ls10, lp09, lp08, lg05, ld06

Closed by LSF administrator: lu04, lu05, ls05, lx09, lw02, lw01;

MEMLIMIT 200 G

RESOURCE REQUIREMENT DETAILS: Combined: select[(healthy=1) && (type == local)] order[!-slots:-maxslots] rusa ge[mem=200.00] span[hosts=1] same[model] affinity[thread(1 )*1]</rusage[mem=200]>

This job won’t run on Lilac cluster, because it requested 4x200=800GB of RAM on the same host span[hosts=1]

There is no host with 800GB on Lilac in cpuqueue.

Please, check resources:

>lshosts HOST_NAME type model cpuf ncpus maxmem maxswp

ls01 X86_64 GTX1080 60.0 72 512G ..

To check limits on resources:

>bresource

Begin Limit NAME = limit11 QUEUES = cpuqueue PER_HOST = ls-gpu/ lt-gpu/ lg-gpu/ lu-gpu/ lx-gpu/ lw-gpu/ ly-gpu/ SLOTS = 68 MEM = 95% ngpus_physical = 0 End Limit

Requested GPUs are not available for requested walltime.

Expand
bjobs -p3 -l….. #BSUB -n 1;#BSUB -gpu 'num=1';#BSUB -R 'span[ptile=1] rusage[mem=30]';#BSUB -q gpuqueue; PENDING REASONS: Candidate host pending reasons (92 of 123 hosts): Job's requirements for resource reservation not satisfied (Resource: ngpus_physical): lx12, lx14, lt01, lt02, lt03, lt04, lt05, lt07, lt08, lt09, lu10, lx05, lx04, lx03, lg02, lu08, lt12, lt19... Affinity resource requirement cannot be met because there are not enough processor units to satisfy the job affinity request: lt10, lt11, lx10, lt13, lt14… Host is reserved to honor SLA guarantees: lp31, lp01, lp03, lp04, lp05, lp06, lp07, lp27, lp33, lp30, lp34, lp25, lp24, lp35, lp10… Non-candidate host pending reasons (32 of 123 hosts): J ob's resource requirements not satisfied: lu02, ls13, lv01, lg06, ld07, ld05. Load information unavailable: ls18, lp21, ls10, lp26, lp09, lp08, lg05, ld06 Closed by LSF administrator: lu04, lu05, ls05, lx09, lw02, lw01; Not enough GPUs on the hosts: ly03, lx01, lx02, lx06, lx11; ESTIMATION: Tue Sep 8 17:00:21: Started simulation-based estimation; Tue Sep 8 17:00:39: Simulated job start time on host(s) <1lt03> This job is waiting for mainly gpu resources to be available in Lilac cluster. Estimated start time is : Simulated job start time on host(s) <1lt03> bhosts -l lt03 HOST lt03 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 60.00 - 72 4 4 0 0 0 - CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem slots ngpus Total 0.0 0.0 0.0 8% 0.0 19 1 76 48.7G 0G 269G 68 4.0 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0G 0G 22G - 0.0 ngpus_physical healthy gpu_shared_avg_ut gpu_shared_avg_mut Total 0.0 1.0 44.0 1.0 Reserved 4.0 0.0 0.0 0.0

Expand

bjobs -p3 -l…..

#BSUB -n 1;#BSUB -gpu 'num=1';#BSUB -R 'span[ptile=1] rusage[mem=30]';#BSUB -q gpuqueue;

PENDING REASONS:

Candidate host pending reasons (92 of 123 hosts):

Job's requirements for resource reservation not satisfied (Resource: ngpus_physical): lx12, lx14, lt01, lt02, lt03, lt04, lt05, lt07, lt08, lt09, lu10, lx05, lx04, lx03, lg02, lu08, lt12, lt19...

Affinity resource requirement cannot be met because there are not enough processor units to satisfy the job affinity request: lt10, lt11, lx10, lt13, lt14…

Host is reserved to honor SLA guarantees: lp31, lp01, lp03, lp04, lp05, lp06, lp07, lp27, lp33, lp30, lp34, lp25, lp24, lp35, lp10… Non-candidate host pending reasons (32 of 123 hosts): J

ob's resource requirements not satisfied: lu02, ls13, lv01, lg06, ld07, ld05. Load information unavailable: ls18, lp21, ls10, lp26, lp09, lp08, lg05, ld06 Closed by LSF administrator: lu04, lu05, ls05, lx09, lw02, lw01;

Not enough GPUs on the hosts: ly03, lx01, lx02, lx06, lx11;

ESTIMATION: Tue Sep 8 17:00:21:

Started simulation-based estimation; Tue Sep 8 17:00:39: Simulated job start time on host(s) <1*lt03>

This job is waiting for mainly gpu resources to be available in Lilac cluster.

Estimated start time is :

Simulated job start time on host(s) <1*lt03>

bhosts -l lt03 HOST lt03 STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW ok 60.00 - 72 4 4 0 0 0 -

CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem slots ngpus Total 0.0 0.0 0.0 8% 0.0 19 1 76 48.7G 0G 269G 68 4.0 Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0G 0G 22G - 0.0 ngpus_physical healthy gpu_shared_avg_ut gpu_shared_avg_mut Total 0.0 1.0 44.0 1.0 Reserved 4.0 0.0 0.0 0.0

GPU type doesn't exist on cluster.

Expand
> bjobs -p3 -l .. Tue Sep 8 16:01:28: Submitted from host , CWD <$HOME>, Requested Resources <select[gpu_model0=='geforcegtx1000']>, Requested GPU; PENDING REASONS: Candidate host pending reasons (0 of 123 hosts). Non-candidate host pending reasons (123 of 123 hosts): Job's resource requirements not satisfied: lp35, lx10, lx11, lx12, lx13, lx14, boson, lt01, lt02, lt03, lt04, lt05, lt06, lt07, lt08 ….. Not specified in job submission: ld01, ld02, ld03, ld04, ld05, ld07, lv01, l i01, lila-sched01, lila-sched02; Load information unavailable: ld06, lg05, lp08, lp09, ls10, ls18, lp21, lp26 Closed by LSF administrator: lw01, lw02, ls05, lu04, lu05, lx09; RUNLIMIT 10.0 min</select[gpu_model0=='geforcegtx1000']> This job won’t run because the gpu_model0 is not correct in bsub and this resource is not available on Lilac cluster : Candidate host : 0 Correct name is GeForceGTX1080 >lshosts -gpu HOST_NAME gpu_id gpu_model gpu_driver gpu_factor numa_id ls01 0 GeForceGTX1080 440.33.01 6.1 0 1 GeForceGTX1080 440.33.01 6.1 0 2 GeForceGTX1080 440.33.01 6.1 1 3 GeForceGTX1080 440.33.01 6.1 1

Expand

> bjobs -p3 -l ..

Tue Sep 8 16:01:28: Submitted from host , CWD <$HOME>,

Requested Resources <select[gpu_model0=='geforcegtx1000']>, Requested GPU;

PENDING REASONS: Candidate host pending reasons (0 of 123 hosts).

Non-candidate host pending reasons (123 of 123 hosts):

Job's resource requirements not satisfied: lp35, lx10, lx11, lx12, lx13, lx14, boson, lt01, lt02, lt03, lt04, lt05, lt06, lt07, lt08 …..

Not specified in job submission: ld01, ld02, ld03, ld04, ld05, ld07, lv01, l i01, lila-sched01, lila-sched02;

Load information unavailable: ld06, lg05, lp08, lp09, ls10, ls18, lp21, lp26

Closed by LSF administrator: lw01, lw02, ls05, lu04, lu05, lx09;

RUNLIMIT 10.0 min</select[gpu_model0=='geforcegtx1000']>

This job won’t run because the gpu_model0 is not correct in bsub and this resource is not available on Lilac cluster :

Candidate host : 0

Correct name is GeForceGTX1080

>lshosts -gpu

HOST_NAME gpu_id gpu_model gpu_driver gpu_factor numa_id ls01 0

GeForceGTX1080 440.33.01 6.1 0 1

GeForceGTX1080 440.33.01 6.1 0 2

GeForceGTX1080 440.33.01 6.1 1 3

GeForceGTX1080 440.33.01 6.1 1

Nodes are in system level reservation used for rolling upgrade or for scheduling cluster level downtime.

Expand
bjobs -p3 -l ... Job <1117724>, User <sveta>, Project <default>, Application <default>, Queue <cpuqueue>, Job Priority <12>, Command <sleep 200000>, Esub <memlimit> Wed May 12 16:36:49: Submitted from host <lilac-ln02>, CWD </etc/security/limits.d>, 2 Task(s), Specified Hosts <lx10>; PENDING REASONS: Candidate host pending reasons (1 of 120 hosts): Not enough slots or resources for whole duration of the job: lx10; Non-candidate host pending reasons (119 of 120 hosts): Not specified in job submission: lp35, lx11, lx12, lx13, lx14, boson, lt01, lt02, lt03, lt04, lt05, lt06, lt07, lt08, lt09, ld01, ld02 ........

Expand

bjobs -p3 -l ...

Job <1117724>, User <sveta>, Project <default>, Application <default>, Queue <cpuqueue>, Job Priority <12>, Command <sleep 200000>, Esub <memlimit>

Wed May 12 16:36:49: Submitted from host <lilac-ln02>, CWD </etc/security/limits.d>, 2 Task(s), Specified Hosts <lx10>;

PENDING REASONS:

Candidate host pending reasons (1 of 120 hosts):

Not enough slots or resources for whole duration of the job: lx10;

Non-candidate host pending reasons (119 of 120 hosts):

Not specified in job submission: lp35, lx11, lx12, lx13, lx14, boson, lt01, lt02, lt03, lt04, lt05, lt06, lt07, lt08, lt09, ld01, ld02

........

Nodes are reserved under SLA.
Expand

...

Content

Space Tools

General Documentation

LSF Primer

Lilac Cluster Guide

Juno Cluster Guide

Cloud Resources

Backup Policy on server/node local drives

File lists

Versions Compared

Old Version 19

New Version 20

Key