General Usage of Quest: Difference between revisions

From csml-wiki.northwestern.edu
Jump to navigation Jump to search
No edit summary
m (added command to specify partition)
 
(12 intermediate revisions by 2 users not shown)
Line 12: Line 12:
# ### AUTOMATICALLY GENERATED BATCH FILE
# ### AUTOMATICALLY GENERATED BATCH FILE
#MOAB -q [queue_name]
#MOAB -q [queue_name]
#MOAB -l advres=grail
#MOAB -A b1011
#MOAB -A b1011
#MOAB -l walltime=[dd:hh:mm:ss]
#MOAB -l walltime=[dd:hh:mm:ss]
Line 38: Line 37:
#MOAB -S /bin/bash
#MOAB -S /bin/bash


module load [module]
cd /projects/b1011/luijten-group/[job_location]
cd /projects/b1011/luijten-group/[job_location]
time mpirun -np 12 ~/lammps-30Aug12_standard/src/lmp2013_mpi -in input.dat
time mpirun -np 12 [directory_name]/[lammps_version] -in input.dat


if [ $? -eq 0 ] ; then
if [ $? -eq 0 ] ; then
Line 56: Line 56:
<tt>[email_address]</tt>
<tt>[email_address]</tt>
This is the email address you used to receive the system notice when job begins, aborted or ended.
This is the email address you used to receive the system notice when job begins, aborted or ended.

<tt>[module]</tt>
Load a module. For mpirun this would be the module mpi. For full list of available modules run <i>module available</i> from the command line.


<tt>[job_location]</tt>
<tt>[job_location]</tt>
This is the address of the folder where your input file is located.
This is the address of the folder where your input file is located.

<tt>[directory_name]</tt>
This is the address of the lammps executable.

<tt>[lammps_version]</tt>
This is the build of lammps you want to run. Must be in [directory_name].


</li>
</li>
<li>Submit jobs
<li>Submit jobs
<pre>
<pre>
msb job.mbs
msub job.mbs
</pre>
</li>
<li>Cancel jobs
<pre>
canceljob [job_number]
canceljob `seq [first_job_number] [last_job_number]`
</pre>
</pre>
<li>Check job status
<li>Check job status
Line 71: Line 86:
show all jobs
show all jobs
<pre>
<pre>
showq -r
qstat
</pre>
show running jobs
<pre>
showq -i
</pre>
show idle jobs
<pre>
showq -w user=[netid]
</pre>
show jobs belonging to the user specified, where [netid] is your NETID.
<pre>
showq -w acct=[account number]
</pre>
show jobs belonging to the account specified. Grail allocation account number: b1011; CCTSM allocation account number: b1023; ESAM allocation account number: b1020.
<pre>
qstat
</pre>
</pre>
show your own jobs
show your own jobs
<pre>
checkjob [job_ID]
</pre>
This command displays detailed information about a submitted job’s status and diagnostic information that can be useful for troubleshooting submission issues. It can also be used to obtain useful information about completed jobs such as the allocated nodes, resources used, and exit codes. NUIT recommends using the flag ‘–vvv’ or ‘–v –v –v’ to gather additional diagnostic information.
<pre>
mjobctl -m partition=<partition name> <job number>
</pre>
This command specifies a partition for a job which is already in the queue. This can be useful if you forget to specify a particular partition in the batch file (or if you want to change the partition, for example from quest3 to quest4), as it allows you to do so without having to delete and resubmit the jobs.
</ul>
</ul>

Latest revision as of 09:53, 14 May 2016

  • Login
    ssh [netid]@quest.it.northwestern.edu    
    

    where [netid] is your NETID. The first time you login, it will ask you to enter file in which to save the key and then to enter passphrase twice. Just press "enter" for these three questions and you should be able to login successfully.

    Our group folder is located in /projects/b1011/luijten-group

  • Example of job.mbs file
    # ### AUTOMATICALLY GENERATED BATCH FILE
    #MOAB -q [queue_name]
    #MOAB -A b1011
    #MOAB -l walltime=[dd:hh:mm:ss]
    
    # ###name of job
    #MOAB -N [name_of_job]
    
    # ### mail for begin/end/abort
    #MOAB -m ea
    #MOAB -M [email_address]                                                                                                      
    
    # ### number of nodes and processors per node
    #MOAB -l nodes=2:ppn=6
    
    # ### indicates that job should not rerun if it fails
    #MOAB -r n
    
    # ### stdout and stderr merged as stderr
    #MOAB -j eo
    
    # ### write stderr to file
    #MOAB -e log.err
    
    # ### the shell that interprets the job script
    #MOAB -S /bin/bash
    
    module load [module]
    cd /projects/b1011/luijten-group/[job_location]
    time mpirun -np 12  [directory_name]/[lammps_version] -in input.dat
    
    if [ $? -eq 0 ] ; then
    touch COMPLETED
    fi   
    

    [queue_name] There are two options for queue name: collab or collab-preempt. Both of them have startup priority of 5000. Collab has maximum cores of 262 and maximum walltime of 7 days. There is no resource restrictions for collab-preempt, but note that queues ending in ‘-preempt’ contain jobs that can be interrupted and re-queued by jobs from a higher priority queue.

    [dd:hh:mm:ss] This is the maximum allowed running time for your job. dd: days; hh: hours; mm: minutes; ss: seconds.

    [name_of_job] This is the name of your job that will be showed in the queue.

    [email_address] This is the email address you used to receive the system notice when job begins, aborted or ended.

    [module] Load a module. For mpirun this would be the module mpi. For full list of available modules run module available from the command line.

    [job_location] This is the address of the folder where your input file is located.

    [directory_name] This is the address of the lammps executable.

    [lammps_version] This is the build of lammps you want to run. Must be in [directory_name].

  • Submit jobs
    msub job.mbs
    
  • Cancel jobs
    canceljob [job_number]
    canceljob `seq [first_job_number] [last_job_number]`
    
  • Check job status
    showq
    

    show all jobs

    showq -r
    

    show running jobs

    showq -i
    

    show idle jobs

    showq -w user=[netid]
    

    show jobs belonging to the user specified, where [netid] is your NETID.

    showq -w acct=[account number]
    

    show jobs belonging to the account specified. Grail allocation account number: b1011; CCTSM allocation account number: b1023; ESAM allocation account number: b1020.

    qstat 
    

    show your own jobs

    checkjob [job_ID] 
    

    This command displays detailed information about a submitted job’s status and diagnostic information that can be useful for troubleshooting submission issues. It can also be used to obtain useful information about completed jobs such as the allocated nodes, resources used, and exit codes. NUIT recommends using the flag ‘–vvv’ or ‘–v –v –v’ to gather additional diagnostic information.

    mjobctl -m partition=<partition name> <job number>
    

    This command specifies a partition for a job which is already in the queue. This can be useful if you forget to specify a particular partition in the batch file (or if you want to change the partition, for example from quest3 to quest4), as it allows you to do so without having to delete and resubmit the jobs.