Notes on Torque: Difference between revisions

From csml-wiki.northwestern.edu
Jump to navigation Jump to search
Line 19: Line 19:
qalter -l cput=[new_cput] [job_id]
qalter -l cput=[new_cput] [job_id]
</pre>
</pre>
where <tt>[new_cput]</tt> is the desired amount of cpu time and <tt>[job_id]</tt> is the job's ID number. Multiple jobs can be changed simultaneously by using <tt>`seq [min_job_id] [max_job_id]`</tt> in the place of <tt>job_id</tt>. This will change all jobs <tt>[min_job_id]</tt> through <tt>[max_job_id]</tt>. Lowering the cpu time requirement of a job can decrease its wait time in the queue, as the scheduler is more likely to be able to use if for backfilling. On the other hand, increasing the cpu time requirement can be used to ensure that a job is able to finish properly (and can be done even while the job is running), but requires root permission.
where <tt>[new_cput]</tt> is the desired amount of cpu time and <tt>[job_id]</tt> is the job's ID number. Multiple jobs can be changed simultaneously by using <tt>`seq [min_job_id] [max_job_id]`</tt> in the place of <tt>job_id</tt>. This will change all jobs <tt>[min_job_id]</tt> through <tt>[max_job_id]</tt>. Lowering the cpu time requirement of a job can decrease its wait time in the queue, as the scheduler is more likely to be able to use it for backfilling. On the other hand, increasing the cpu time requirement can be used to ensure that a job is able to finish properly (and can be done even while the job is running), but requires root permission.
</li>
</li>
<li>Move a queued (i.e., waiting) job to a different queue via
<li>Move a queued (i.e., waiting) job to a different queue via

Revision as of 14:30, 28 January 2015

Overview

General usage

Special usage notes

  • If qb indicates a node is down, check the ganglia of Minotaur or Hydra to find which node it is (down nodes are in red at the top of the list). Grep the output of qstat to find the jobs running on any down nodes with
    qstat -f | grep -B 3 [node_ID]
    

    where [node_ID] is the name of the down node, e.g., h036 on Hydra. Make sure the owners of all jobs on the node have a chance to take note of which of their jobs went down and then restart the down nodes using Microway control.

  • Change the total cpu time allotted to a job via
    qalter -l cput=[new_cput] [job_id]
    

    where [new_cput] is the desired amount of cpu time and [job_id] is the job's ID number. Multiple jobs can be changed simultaneously by using `seq [min_job_id] [max_job_id]` in the place of job_id. This will change all jobs [min_job_id] through [max_job_id]. Lowering the cpu time requirement of a job can decrease its wait time in the queue, as the scheduler is more likely to be able to use it for backfilling. On the other hand, increasing the cpu time requirement can be used to ensure that a job is able to finish properly (and can be done even while the job is running), but requires root permission.

  • Move a queued (i.e., waiting) job to a different queue via
    qmove [destination] [job_id]
    

    where [destination] is the new queue (either 'fast' or 'default' for our system) and [job_id] is the job ID.

  • Delete sequence of jobs via
    qdel `seq [job_id1] [job_id2]`
    

    where [job_id1] is the first job ID of the sequence of jobs you want to delete and [job_id2] is the last one.