Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Nyx currently has 8 compute nodes na01-08 (288 CPUs) with Gclisi Isilon as the main file system. The login node is 'nyx'.

...

5. To check jobs which are DONE or status EXIT, use "bhist -l JobID" or "bhist -n 0 -l JobID". bacct is also available. "bjobs -l JobID" only shows RUNNING and PEND jobs.

Queue

The Nyx cluster uses LSF (Load Sharing Facility) 10.1FP6 from IBM to schedule jobs. The default queue, ‘general’, includes all Nyx compute nodes.

Job Resource Control Enforcement in LSF with cgroups

LSF 10.1 makes use of Linux control groups (cgroups) to limit the CPU cores and memory that a job can use. The goal is to isolate the jobs from each other and prevent them from consuming all the resources on a machine. All LSF job processes are controlled by the Linux cgroup system.  If a job's processes on a host use more memory than it requested, the job will be terminated by the Linux cgroup memory sub-system.

LSF Configuration Notes

Memory (-M or -R "rusage[mem=**]" ) is a consumable resource. specified as GB per slot/task (-n). 

LSF will terminate any job which exceeds its requested memory (-M or -R "rusage[mem=**]").

Job Default Parameters

Queue name: general

Operating System: CentOS 7

Number of slots (-n): 1

Maximum Waltime (job runtime): N/A

Memory (RAM) : 2GB

Job Submission 

To submit  one slot and 6GB of RAM job to Nyx nodes :

Code Block
languagebash
bsub -n 1 -R "rusage[mem=6]" -J test