You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 59 Next »

The new Juno cluster is now available for general use.

Please request "Access another cluster" from http://hpc.mskcc.org/compute-accounts/account-change-request/ if you need to activate your account and you already have an account on 'Luna'.

Juno currently has 1072 CPUs. The login node is 'Juno', and runs CentOS 7 with GPFS as the main file system. 

Nodes jx01-jx20, jv01-jv04, and jw02 are running CentOS 7 with GPFS

**Node ju14 is running CentOS 6 without GPFS.

All Juno nodes have access to the Solisi Isilon file systems. CentOS 7 nodes also have access to the "new" /juno GPFS storage.

Configuration change log

Slides from Sep 27 2018 User Group meeting (updated Nov 28)  Juno_UG_092018_final.pdf

December 17, 2018: The new queue "control" has been added. Please, check "Queues"

November 28, 2018: The default OS type is CentOS07. Please, check "Job Submission"

Differences in LSF Configuration between Juno and luna

  1. We reserve ~12GB of RAM per host for the operating system and GPFS on Juno CentOS 7 hosts.
  2. On each jx## node (CentOS 7, GPFS installed), 240GB of RAM is available for LSF jobs.
  3. On each ju## node (CentOS 6, no GPFS), 250GB of RAM is available for LSF jobs.
  4. When specifying RAM for LSF jobs, specify GB of RAM per task (slot) on Juno (unlike luna, where RAM is specified per job).
  5. All jobs must have -W (maximum execution Walltime) specified on Juno. Please do not use -We on Juno.
  6. There is no /swap on CentOS 7 nodes. Memory usage is enforced by cgroups so jobs never swap. A job will be terminated if memory usage exceeds its LSF specification.
  7. To check jobs which are DONE or status EXIT, use "bhist -l JobID" or "bhist -n 0 -l JobID". bacct is also available. "bjobs -l JobID" only shows RUNNING and PEND jobs.
  8. There is no iounits resource on Juno.
  9. CMOPI SLA configured on Juno. The loan policies are: 100% resources for 90 mints jobs for not SLA users, and 75% resources for 240 mints jobs for not SLA users.

Queues

The Juno cluster uses LSF (Load Sharing Facility) 10.1FP6 from IBM to schedule jobs. The Juno cluster has two queues: general and control. The default queue, ‘general’, includes Juno compute nodes.

The control queue doesn't have wall-time limitation and has one node with 144 oversubscribed slots. The control queue should be used only for monitoring or control jobs (the jobs which doesn't use real CPU and memory resources).

To submit the job to the control queue:

bsub -n 1 -q control  -M 1 

 

Job Resource Control Enforcement in LSF with cgroups

LSF 10.1 makes use of Linux control groups (cgroups) to limit the CPU cores and memory that a job can use. The goal is to isolate the jobs from each other and prevent them from consuming all the resources on a machine. All LSF job processes are controlled by the Linux cgroup system.  If a job's processes on a host use more memory than it requested, the job will be terminated by the Linux cgroup memory sub-system.

LSF Configuration Notes

Memory (-M or -R "rusage[mem=**]" ) is a consumable resource. specified as GB per slot/task (-n). 

LSF will terminate any job which exceeds its requested memory (-M or -R "rusage[mem=**]").

All jobs should specify Walltime (-W), otherwise a default Walltime of 6 hours will be used.

LSF will terminate any job which exceeds its Walltime.

The maximum Walltime for general queue is 744 hours (31 days). 

Job Default Parameters

Queue name: general

Operating System: CentOS 7

Number of slots (-n): 1

Maximum Waltime (job runtime): 6 hours

Memory (RAM) : 2GB

Job Submission 

By default jobs submitted on juno only run on CentOS 7 nodes (with GPFS). Users can specify CentOS 6 nodes (Isilon only), CentOS 7 nodes (with GPFS), or either type.

To submit a job to CentOS 7 nodes use either of these formats:

bsub -n 1 -W 1:00 -R "rusage[mem=2]"
bsub -n 1 -W 1:00 -app anyOS -R "select[type==CentOS7]" -R "rusage[mem=2]"


To submit a job to CentOS 6 nodes:

bsub -n 1 -W 1:00 -app anyOS -R "select[type==CentOS6]" -R "rusage[mem=2]"


To submit a job to any nodes, running either CentOS 6 or 7:
 

bsub -n 1 -W 1:00 -app anyOS -R "rusage[mem=2]"

To submit a job to nodes with NVMe /fscratch:

bsub -n 1 -R "fscratch" 
bsub -n 1 -R "pic"