General Usage of Quest

From csml-wiki.northwestern.edu
Jump to navigation Jump to search
  • Login
    ssh [netid]@quest.it.northwestern.edu    
    

    where [netid] is your NETID. The first time you login, it will ask you to enter file in which to save the key and then to enter passphrase twice. Just press "enter" for these three questions and you should be able to login successfully.

    Our group folder is located in /projects/b1011/luijten-group

  • Example of job.mbs file
    # ### AUTOMATICALLY GENERATED BATCH FILE
    #MOAB -q [queue_name]
    #MOAB -A b1011
    #MOAB -l walltime=[dd:hh:mm:ss]
    
    # ###name of job
    #MOAB -N [name_of_job]
    
    # ### mail for begin/end/abort
    #MOAB -m ea
    #MOAB -M [email_address]                                                                                                      
    
    # ### number of nodes and processors per node
    #MOAB -l nodes=2:ppn=6
    
    # ### indicates that job should not rerun if it fails
    #MOAB -r n
    
    # ### stdout and stderr merged as stderr
    #MOAB -j eo
    
    # ### write stderr to file
    #MOAB -e log.err
    
    # ### the shell that interprets the job script
    #MOAB -S /bin/bash
    
    module load [module]
    cd /projects/b1011/luijten-group/[job_location]
    time mpirun -np 12  [directory_name]/[lammps_version] -in input.dat
    
    if [ $? -eq 0 ] ; then
    touch COMPLETED
    fi   
    

    [queue_name] There are two options for queue name: collab or collab-preempt. Both of them have startup priority of 5000. Collab has maximum cores of 262 and maximum walltime of 7 days. There is no resource restrictions for collab-preempt, but note that queues ending in ‘-preempt’ contain jobs that can be interrupted and re-queued by jobs from a higher priority queue.

    [dd:hh:mm:ss] This is the maximum allowed running time for your job. dd: days; hh: hours; mm: minutes; ss: seconds.

    [name_of_job] This is the name of your job that will be showed in the queue.

    [email_address] This is the email address you used to receive the system notice when job begins, aborted or ended.

    [module] Load a module. For mpirun this would be the module mpi. For full list of available modules run module available from the command line.

    [job_location] This is the address of the folder where your input file is located.

    [directory_name] This is the address of the lammps executable.

    [lammps_version] This is the build of lammps you want to run. Must be in [directory_name].

  • Submit jobs
    msub job.mbs
    
  • Cancel jobs
    canceljob [job_number]
    canceljob `seq [first_job_number] [last_job_number]`
    
  • Check job status
    showq
    

    show all jobs

    showq -r
    

    show running jobs

    showq -i
    

    show idle jobs

    showq -w user=[netid]
    

    show jobs belonging to the user specified, where [netid] is your NETID.

    showq -w acct=[account number]
    

    show jobs belonging to the account specified. Grail allocation account number: b1011; CCTSM allocation account number: b1023; ESAM allocation account number: b1020.

    qstat 
    

    show your own jobs

    checkjob [job_ID] 
    

    This command displays detailed information about a submitted job’s status and diagnostic information that can be useful for troubleshooting submission issues. It can also be used to obtain useful information about completed jobs such as the allocated nodes, resources used, and exit codes. NUIT recommends using the flag ‘–vvv’ or ‘–v –v –v’ to gather additional diagnostic information.

    mjobctl -m partition=<partition name> <job number>
    

    This command specifies a partition for a job which is already in the queue. This can be useful if you forget to specify a particular partition in the batch file (or if you want to change the partition, for example from quest3 to quest4), as it allows you to do so without having to delete and resubmit the jobs.