The new Juno cluster is now available to the test users only. Please, create a request if you want to participate. It is currently composed of 216 CPUs. The login node name is juno. It's a CentOS 7.3 host. The latest CentOS 7.3 Linux operating system is on nodes jx01-jx10  and the Centos 6.9 Linux operating system is on node ju14 . All cluster nodes have access to Isilon solisi file system. The CentOS 7.3 nodes have access to the GPFS storage named juno.

Differences in LSF configuration between Juno and Luna

  1. We reserve ~12GB of RAM per host for the operating system and GPFS on Juno CentOS 7 hosts.
  2. On each jx## node (CentOS 7, GPFS installed), 240GB of RAM is available for LSF jobs.
  3. On each ju## node (CentOS 6, no GPFS), 250GB of RAM is available for LSF jobs.
  4. When specifying RAM for LSF jobs, specify GB of RAM per slot/task on Juno, or per job on Luna.
  5. All jobs must have -W (maximum Walltime) specified on Juno. Please do not use -We on Juno.
  6. To check jobs which are DONE or status EXIT, use "bhist -l JobID" or "bhist -n 0 -l JobID". bacct is also available. "bjobs -l JobID" only shows RUNNING and PEND jobs.
  7. There is no /swap on CentOS 7 nodes. Memory usage is enforced by cgroups, so job never use /swap. A job will be terminated if memory usage exceeds its LSF specification.
  8. There is no iounits resource on Juno.

Queues

The Juno cluster uses LSF (Load Sharing Facility) 10.1  FP6 from IBM to schedule jobs. The default LSF queue, ‘general’, includes all Juno compute nodes.

Job resource control enforcement in LSF with cgroups

LSF 10.1 makes use of Linux control groups (cgroups) to limit the CPU cores and memory that a job can use. The goal is to isolate the jobs from each other and prevent them from consuming all the resources on a machine. All LSF job processes are controlled by the Linux cgroup system.  If a job's processes on a host use more memory than it requested, the job will be terminated by the Linux cgroup memory sub-system.

New LSF cluster level resource configurations

Memory (-M or -R "rusage[mem=**]" ) is a consumable resource requested as GB per slot/task (-n). 

LSF will terminate the job if it exceeds requested memory (-M or -R "rusage[mem=**]" ).

All jobs should specify Walltime (-W), otherwise the default Walltime 6 hours will be used.

LSF will terminate the job if it exceeds the Walltime.

The maximum Walltime for general queue is 744 hours (31 days). 

Job Default parameters

Queue name: general

Operating System: CentOS 7.3

Number of slots (-n): 1

Waltime (job running time): 6 hours

Memory (RAM) : 2GB

Job Submission 

The default operating system for Juno jobs is CentOS 7.3. Users can request that jobs will run on one specific type of operating system or any operating system.

To submit the job to default (CentOS 7.3) operating system:

bsub -n 1 -W 1:00 -R "rusage[mem=2]"


To submit the job to CentOS 6.9 operating system:

bsub -n 1 -app anyOS -W 1:00 -R "rusage[mem=2]" -R "select[type==CentOS6]"


To submit the job to any operating system:
 

bsub -n 1 -app anyOS -W 1:00 -R "rusage[mem=2]"

To submit the job on node with available  NVMe scratch /fscratch :

bsub -n 1 -R "fscratch" ....