You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

This page documents the various questions related to lilac storage. Lilac storage is primarily divided into 4 categories.

Lilac home storage : 

    • Description : GPFS shared parallel filesystem, replicated, and not backed up.
    • Purpose: To store software related code and scripts, default quota size is small and fixed.
    • Mount: /home/<user>
    • Access: All Lilac nodes, including compute storage and login nodes.
    • Default quota: 100GB
    • Snapshots: 7 days of snapshots. ( not backed up ). Can be accessed in /home/.snapshots/<user>

Lilac compute storage

    • Description : GPFS shared parallel filesystem, replicated, and not backed up.
    • Purpose: For jobs to read and write compute data from login and compute nodes, default quota size is larger with flexibility to request larger quota.
    • Mount: /data/<lab group>
    • Access: All Lilac nodes, including compute storage and login nodes.
    • Default quota: 5TB ( Increased/Decreased on request )
    • Snapshots: 7 days of snapshots. ( not backed up ). Can be accessed in /data/.snapshots/<date>/<lab group>

Lilac warm storage : 

    • Description : GPFS shared parallel filesystem, not replicated, and not backed up. Comparatively slower than lilac compute storage.
    • Purpose: To store long term data. Only accessible from login nodes and cannot be accessed from compute nodes.
    • Mount: /warm/<lab group>
    • Access: Only lilac and luna login nodes.
    • Default quota: 5TB ( Increased/Decreased on request )
    • Snapshots: 7 days of snapshots. ( not backed up ). Can be accessed in /warm/.snapshots/<date>/<lab group>

Lilac local scratch storage : 

    • Description : XFS filesystem, not replicated, and not backed up. Local and not a shared filesystem, slower than GPFS.
    • Purpose: To store local temporary data related to compute jobs. Since this is not a shared filesystem, the temporary data needs to be cleaned up and copied back to shared filesystem after job completion.
    • Mount: /scratch/
    • Access: Only lilac compute nodes.
    • Default quota: No quota and limited to free disk space in /scratch.
    • Snapshots: No snapshots.

How to :

  1. Check Quota for GPFS filesystem:

    Since blocks on Lilac GPFS home/compute storage are replicated, quota is double the apparent size of data.

        • Lilac home storage :

          Command line
          mmlsquota lila:home --block-size auto
        • Lilac compute storage : 

          Command line
          mmlsquota lila:data_<lab group name> --block-size auto
          
          
          Command line
          df -h /data/<lab group name>
          df -ih /data/<lab group name>
        • Lilac warm storage (oscar) :

          Command line
          mmlsquota oscar:warm_<lab group name> --block-size auto
          
          
          Command line
          df -h /warm/<lab group name>
          df -ih /warm/<lab group name>
          
      • mmlsquota gives information about quota on number of files too, along with information about block quota 

2. Copy files from other clusters:

HAL cluster is outside the firewall, so lilac cannot be accessed directly from HAL cluster

      • SABA/LUNA/LUX: 
        To copy files from other clusters, first ssh -A into the other cluster to forward your keys.

        Command line
        ssh -A $USERNAME@$CLUSTER 
        

        We recommend rsync -va to copy files and directories.

        Make note of the source directory/source files and destination directory/files on Lilac and copy them as below:

        Command line
        rsync -av --progress $SOURCEPATH lilac:$DESTPATH
      • HAL:
        Remember that the hal cluster is outside the MSKCC network, and does not have access to lilac

        First - Make note of the source directory/source files on HAL and destination directory/files on Lilac:

        To transfer data, ssh into lilac as below :

        Command line
        ssh -A $USERNAME@lilac.mskcc.org

        Then pull files from HAL:

        Command line
        rsync -av --progress hal:$SOURCEPATH $DESTPATH
        • Make sure you calculate the size of data you will copy to lilac, and that you have enough space on lilac to avoid hitting your hard quota. lilac uses data replication to for safety, so a file containing 1G of data consumes 2G of quota on lilac.
        • You can see the size of files and directories with du, which will show 2G for 1G of file data due to replication. To see file size without replication overhead use du --apparent-size instead:
        • Depending on the size and number of files to copy, you may run multiple rsync commands simultaneously to copy different directories.
        • The HPC private network is faster than the MSKCC campus network, so using short names (lilacsabalunaselene, etc.) will often make transfers faster than using fully qualified domain names such as luna.mskcc.org. This does not apply to hal, though