Hardware

From csml-wiki.northwestern.edu
Jump to navigation Jump to search

Desktop machines

All desktop machines run OpenSuSE. Installation instructions for OpenSuSE 13.1.

Clusters

Minotaur

  • 38 nodes, each containing two 4-core processors (304 cores total). 8 GB memory per node.
    Processor type: Intel Xeon E5472, 3.0 GHz.
  • Jobs are scheduled via Torque/Maui. Notes on Torque.

Hydra

  • 60 nodes, each containing two 6-core processors (720 cores total). 12 GB memory per node.
    8 nodes (queue "fast", nodes h001-h008) have Intel Xeon X5690 3.47 GHz processors.
    52 nodes (queue "default", nodes h009-h060) have Intel Xeon E5645 2.40 GHz processors.
  • Jobs are scheduled via Torque/Maui. Notes on Torque.
  • General Usage of Hydra

Quest

Disk space, backups, and RAID storage

Disk space allocations and nightly backups

Each user has a home directory located on ariadne. This home directory is exported to all desktop machines, so that you see the same home filesystem on each machine. The drive is protected against hardware failure via a [RAID-1] setup. Furthermore, each night all new or modified files on /home are written to tape (located in ariadne). This makes it important not to store temporary data in your home folder, as it would quickly fill up the tape. Since users tend to forget this, a quota system has been enabled on ariadne, restricting each user to 15 GB. To check how much space you are using log on to ariadne and issue the command

quota -s

In addition, each user has significant additional storage on the scratch partitions. These drives are located in the different desktop machines and protected via RAID-1, but backups are your own responsibility. Note that these partitions are generally only mounted on the desktop machine that contains the corresponding drives. If you need a partition to be exported to a different machine, please ask.

Changing the nightly backup tape

  1. Press eject button on tape drive in ariadne.
  2. Take the tape cartridge out of the drive and put it in its box (should be on top of ariadne). Label the box. Give to Erik.
  3. Insert cleaning tape (on top of ariadne). It will work for less than a minute and then eject automatically.
  4. Put cleaning tape back in box on top of ariadne.
  5. Insert new DDS tape (find in cabinet). Leave empty box on top of ariadne.
  6. Erik: Update settings in /usr/local/lib/backup, namely position and tapenumber; update logfile.

Recovering data from the nightly backup tape

Log files of all nightly backup tapes are located on ariadne, in /usr/local/lib/backup. For privacy reasons, these logfiles are only accessible to root. Once the proper file to be recovered has been identified, insert the corresponding tape into the drive on ariadne and follow these steps (all to be executed as root):

  1. cd /
    (if you change to a different directory, the recovered file will be placed relative to this directory)
  2. /usr/local/bin/tape-rewind
    (or mtst -f /dev/nst0 rewind)
  3. mtst -f /dev/nst0 fsf <position>
    (see the contents file in /usr/local/lib/backup for the position number)
  4. tar xzvf /dev/nst0 <full_file_name_without_leading_slash>
    This step won't work unless you omit the leading slash; also note that you can specify multiple files, separated by spaces. The 'z' option is necessary because all nightly backups are compressed. For wildcards, use --wildcards and escape '*' and '?'. For example: tar -x --wildcards -zvf /dev/nst0 \*datafiles\*
  5. /usr/local/bin/tape-rewoffl
    (or mtst -f /dev/nst0 rewoffl)

Archiving data using the LTO tape drive

Checking RAID status

  • Hydra
  • Minotaur
    Web interface. Log in as root to head node and use opera.
  • Ariadne
    RAID-5 controller with 4 drives. Status can be checked by interrogating the controller:
    /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL | less
    

    In the 'Device Present' section, it is reported if any drives are critical or have failed, and what the state of the RAID is. More detailed information can also be found via

    /opt/MegaRAID/MegaCli/MegaCli64 -LDPDInfo -aAll | less
    
    Directly at the beginning (under 'Adapter #0') it should report 'State: Optimal'
  • Desktop machines, except pelops
    Hardware RAID-1. The RAID status is reported upon reboot of a machine. Press Ctrl-C (when prompted) to enter the configuration utility. From within Linux, use (as root):
    mpt-status -i 0
    mpt-status -i 2
    

    The second command only applies to machines with a second set of hard drives (achilles, agamemnon, nestor, poseidon)
    To allow regular users to verify the RAID status, the mpt-status has been added to sudo:

    sudo mpt-status -i 0
    sudo mpt-status -i 2
    
  • Pelops: Software RAID (for OS and scratch partitions). See Hydra.

Printers

Scanner

UPS

All our UPS units are manufactured by APC, and supported via apcupsd. Installation & configuration instructions:

  • Make sure sure the apcupsd package is installed, see Installation instructions for OpenSuSE 13.1.
  • Connect UPS unit to USB port of the corresponding machine.
  • In /etc/apcupsd/apcupsd.conf edit these lines:
    UPSCABLE usb
    UPSTYPE usb
    

    Also, comment out the DEVICE line.

  • From command line, do
    chkconfig apcupsd on
    
  • Start the daemon manually:
    apcupsd
    
  • Test it:
    apcaccess
    

    This should produce extensive output regarding the UPS unit.
    (Note: this command also works for regular users; in that case use /usr/sbin/apcaccess.)