Created at:
Modified at:
TORQUE notes
TORQUE is a resource manager to provide to processes, a organized way to access a system's resource. In conjunction with Maui, which is a scheduler for the queues, it are the main tools installed in a HPC system. This page will show you basics about TORQUE installation and configuration.
There are some proprietary alternatives like PBS Professional, that joins both a resource manager and a scheduler and Moab from Adaptive Computing, which is another HPC suite. Those are not covered here.
Overview
(2014-02-21)
You can get an overview of TORQUE in the TORQUE architecture page of the TORQUE Administration Guide, but in summary it is as follows:
For a successful TORQUE installation, four daemons should be running:
- trqauthd
-
The daemon that clients connect to (using TORQUE commands like
qsub
,qdel
,qstat
, etc. It then authorizes connections topbs_server
. See more information on the "Configuring trqauthd for client commands" linked below. It opens a UNIX Domain Socket in/tmp/trqauthd-unix
. - pbs_server
-
The daemon that gets in touch with
pbs_mom
(in nodes) to run new jobs. Default listen to port 15001. - pbs_mom
-
Responsible for running jobs in nodes. Communicates with
pbs_server
. Default listen to ports 15002 and 15003. - pbs_sched
-
Basic scheduler that is called by
pbs_server
in a period of time seescheduler_iteration
setting in thepbs_server_attributes(7)
man page.pbs_sched
is a very simple scheduler and is usually replaced by Maui. Schedulers default configuration listen to port 15004.
Configuring trqauthd for client commands
The following diagram is a summary of the communication between daemons and user programs::
Here, "master node" is the computer that has TORQUE administraton tools
installed and that makes all the work of enqueing jobs, running the scheduler,
allocation of resources, etc. Normally it is not used for processing jobs.
"slave nodes" are the nodes used to run processing jobs. Each "slave node"
runs a instance of pbs_mom
. Both kind of nodes can be ran in one
computer (so, therefore running all daemons in just one computer).
Installation in a supercomputer
At the time of this writing, TORQUE 4.2.6 was the newest version. So let's work with that. I'm working on installing it not on a traditional Beowulf Cluster (see link below), but in a supercomputer with 136 processors (we will use just 134) and more than 260 GB RAM.
After downloading and unpacking the tarball, let's configure the system. As the root user do::
# export TORQUE_HOME=/opt/torque-4.2.6
# ./configure --prefix=$TORQUE_HOME
# make
# make install
We should not forget init scripts. Since our system is SUSE Linux Enterprise Server the commands are::
# cp contrib/init.d/suse.trqauthd /etc/init.d/trqauthd
# chkconfig -add trqauthd
# service trqauthd start
I prefer not to install it among other things in /usr
, so I used the
--prefix
thing above. We will call that directory with $TORQUE_HOME
.
After that, it is important to tell our system where we just installed TORQUE, if it is not a standard location::
# export PATH=$PATH:$TORQUE_HOME/bin:$TORQUE_HOME/sbin
It is a good idea to add it to your rc scripts.
Than, let's create the server database with::
# pbs_server -t create
*Note:*
In the "Installing TORQUE" section of the TORQUE Administration
Guide, it tell us to use the ./torque.setup root
script, that
already creates a basic setup for qmgr
for us, but we got a problem
(see "Error qmgr obj= svr=default: Bad ACL entry in host list MSG=First
bad host" below).
After that, we should start pbs_server
, and call the
qmgr
program to
setup the queues:
# qterm
# pbs_server
# qmgr
In the qmgr
console, let's configure a basic queue called batch
with
the following commands::
create queue batch
set queue batch queue_type = Execution
set queue batch resources_max.mem = 100gb
set queue batch resources_max.procs = 100
set queue batch resources_max.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
set server scheduling = True
set server managers = root@hostname
set server default_queue = batch
set server log_events = 511
set server mail_from = root
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
We already have a queue setup. Now we need to tell TORQUE to use the computer we are installed. Configuration and daemons for ordinary nodes are different for the master node but, since we are using a standalone supercomputer, we are running it both as the server and the client.
Configuration that should be in nodes are $TORQUE_HOME/mom_priv/config
.
It should just be::
$pbs_server hostname
*Note:*
Avoid using `localhost` as the hostname. Use the output of the
`hostname` command. I've had some problems with that (see the
Troubleshooting_ section) which is probably misconfiguration I didn't
realize.
And, in the server side, let's just specify the nodes in the
$TORQUE_HOME/server_priv/nodes
file. In our case::
hostname np=134
Where "hostname" is the output of the hostname
command. See we specify
the number of processors and can also specify other settings.
After that, let's restart the daemons and also start client-side daemons::
# pkill pbs_server
# qterm
# pbs_server
# pbs_mom
Let's see the output of pbsnodes::
# pbsnodes
hostname
state = job-exclusive
np = 134
ntype = cluster
jobs = 0/12.hostname.domain
status = rectime=1390849225,varattr=,jobs=12.hostname.domain,state=free,netload=135590624,gres=pbs_server:= hostname,loadave=16.08,ncpus=134,physmem=269558192kb,availmem=273994304kb,totmem=280048592kb,idletime=91,nusers=1,nsessions=4,sessions=147029 163085 164261 173479,uname=Linux hostname 2.6.16.60-0.42.10-default #1 SMP Tue Apr 27 05:11:27 UTC 2010 ia64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
And let's see if we see our queues::
# qstat -q
server: hostname
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
batch 100gb -- -- -- 1 0 -- E R
----- -----
1 0
In this example we already submitted a job, which is running.
A complex queue setup
This basic installation works fine for one queue, but normally TORQUE users
use it on a cluster with many nodes. The jobs must be scheduled so no user
get's more priority than others. Another different and more complex setup, is
the one that follows (the output of qmgr -c 'p s'
::
create queue routing
set queue routing queue_type = Route
set queue routing route_destinations = P16
set queue routing route_destinations += P8
set queue routing route_destinations += P4
set queue routing route_destinations += P1-2
set queue routing route_destinations += serial
set queue routing enabled = True
set queue routing started = True
create queue serial
set queue serial queue_type = Execution
set queue serial resources_max.mem = 20gb
set queue serial resources_max.ncpus = 1
set queue serial resources_max.nodes = 1
set queue serial resources_max.procs = 1
set queue serial resources_max.walltime = 720:00:00
set queue serial resources_min.procs = 1
set queue serial max_user_run = 4
set queue serial enabled = True
set queue serial started = True
create queue P1-2
set queue P1-2 queue_type = Execution
set queue P1-2 resources_max.mem = 20gb
set queue P1-2 resources_max.procs = 2
set queue P1-2 resources_max.walltime = 24:00:00
set queue P1-2 resources_min.procs = 1
set queue P1-2 enabled = True
set queue P1-2 started = True
create queue P4
set queue P4 queue_type = Execution
set queue P4 resources_max.mem = 12gb
set queue P4 resources_max.procs = 4
set queue P4 resources_max.walltime = 24:00:00
set queue P4 enabled = True
set queue P4 started = True
create queue P8
set queue P8 queue_type = Execution
set queue P8 resources_max.mem = 24gb
set queue P8 resources_max.procs = 8
set queue P8 resources_max.walltime = 24:00:00
set queue P8 max_user_run = 8
set queue P8 enabled = True
set queue P8 started = True
create queue P16
set queue P16 queue_type = Execution
set queue P16 resources_max.mem = 24gb
set queue P16 resources_max.procs = 16
set queue P16 resources_max.walltime = 24:00:00
set queue P16 max_user_run = 1
set queue P16 enabled = True
set queue P16 started = True
set server scheduling = True
(... rest of server configuration)
We now have a more complex setup, with different queues that have different
attributes (number of processors, walltime, memory available etc.). More
information about the attributes in the TORQUE Administration Guide. We
also have a "routing" queue to route jobs to the right queues. TORQUE alone
(with its very simple scheduler, pbs_sched
cannot do that for us.
In that case, we need to use another scheduler. We are going to use Maui_. Take a look at our page about Maui (linked below) for maui installation and setup with TORQUE.
Troubleshooting
Error qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host
For some reason that I don't know why (Google didn't help) I got this error
when running ./torque.setup root
command as recommended by the TORQUE
Administration Guide. So I ran pbs_server -t create
and configured the
queues manually.
pbsnodes showing down host
If the output of pbsnodes
is::
# pbsnodes
localhost
state = down
np = 134
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
Check the content of the $TORQUE_HOME/server_priv/nodes
and
$TORQUE_HOME/mom_priv/config
files, as well the hostname of the host you
are running TORQUE server and clients.
Only one job run per time, even if there is resources free
(2014-02-18)
There can be different reasons for this problem, like a misconfigured scheduler or queue. In our case, we were trying to configure TORQUE in a supercomputer environment, with lots of CPUs and memory.
We know that in a cluster environment, pbs_server
is executed in a "master
node" and pbs_mom
on the others. A supercomputer environment is a single
computer that both runs pbs_server
and pbs_mom
. There is no problem
for that. When we execute pbsnodes
command we will see one single node
with lots of processors::
# pbsnodes
localhost
state = down
np = 134
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
After running the job, it just put the whole computer in a job-exclusive
state, preventing other jobs to run::
# pbsnodes
bachianas
state = job-exclusive
np = 134
ntype = cluster
jobs = 0/36.bachianas
status = rectime=1391694740,varattr=,jobs=36.bachianas,state=free,netload=17322218165,gres=,loadave=6.02,ncpus=136,physmem=269558192kb,availmem=271858224kb,totmem=280048592kb,idletime=420,nusers=1,nsessions=4,sessions=50921 55356 55540 192338,uname=Linux bachianas 2.6.16.60-0.42.10-default #1 SMP Tue Apr 27 05:11:27 UTC 2010 ia64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
We should not treat this machine as a cluster. TORQUE sees the whole machine as a single node and locks it for one job, doesn't matter if the job uses 1 or 134 CPUs.
In this very specific case, the machine supported the NUMA_ architecture, so we can compile TORQUE with NUMA support to divide the CPUs in logical units. Check the TORQUE on NUMA systems section (linked below) of the TORQUE Administration Guide.
qrun: Unknown node-attribute
This error can have different causes. One is that your scheduler is probably not running or cannot communicate with pbs_server. Log of my maui setup showed something like::
02/14 12:00:13 MRMClusterQuery()
02/14 12:00:13 WARNING: no resources detected
02/14 12:00:13 MRMWorkloadQuery()
02/14 12:00:13 WARNING: no workload detected
In my case it was a problem with Maui. This was a problem with the
RMCFG[HOSTNAME]
setting.
job failing into the wrong queue
(2014-02-28)
Job failing in the wrong queue can have several reasons. The most common reason, IMO, is wrong queue configuration, not a problem with the job itself. But another common reason is wrong PBS directives. The following directive::
#PBS nodes=2:ppn=4
is *wrong*. The right one would be::
#PBS -l nodes=2:ppn=4
See -l
argument? That is the right way to use it.
job in 'R' state, but Time Use is always 00:00:00
(2014-06-03)
After queueing a job, I executed qrun
on it, because Maui_'s scheduler
was stopped, for testing purposes. After running qrun
, it changed to the
R
state in qstat
, but the column Time Use
didn't change. It
was 00:00:00
.
07/02/2014 15:53:53;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_request, bad attempt to connect from 200.XXX.XXX.XXX:1023 (address not trusted - check entry in server_priv/nodes)
Solution 1: In one case, the origin of this problem was that in
$TORQUE_HOME/server_name
it was correctly the hostname of the machine, but
in /etc/hosts
the hostname was associated with a external IP (200.XXX...)
that nodes could not access. The solution was to change the IP entry in
/etc/hosts
to the internal IP address other nodes can access.
Solution 2: In another case, pbs_mom
was just having problems to execute
in some nodes. Why? The log directory (usually /var
) was full. I needed
to delete some stuff.
Job being completed just after submit, with no further information
(2014-07-17)
This error can be several reasons. It is better to check the logs in
$TORQUE_HOME/server_logs
.
There is an error like::
PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id), aux=0, ty pe=LocateJob, from user@node
where user
is the user login name and node
is the name of the node.
Server is OK, scheduler is OK too. It is probably some problem in the node.
After logging in it, I realized that users had problems with the NFS
partitions, since we made changes to the NFS server and the local firewall.
umount
and mount
them again was not enough, so we had to reboot all
nodes. So, it worked out.
Job never enters state R (Run)
(2014-11-06)
This is a *very common* problem that can have many different reasons.
Reason 1: Filesystem that has logs is full
A problem I had was the following: there was some jobs running on the system, but newer jobs wasn't running. If we just check its status with Maui's command checkjob, it tell us::
# checkjob 3038
(...)
job is deferred. Reason: NoResources (exceeds available partition procs)
Holds: Batch Defer (hold reason: NoResources)
PE: 16.00 StartPriority: 31
cannot select job 3038 for partition DEFAULT (job hold active)
So we have a "Maui hold" in it. If we just try to relase it with
releasehold and wait the scheduler cycle, we see that it is put the hold
again. Maui log doesn't help and just tell us that there aren't available
resources, like checkjob
just did.
pbsnodes tell us jobs are free. But, if you investigate with care, you will see a different message::
bachianas-1
state = free
np = 4
ntype = cluster
status = rectime=1413550150,varattr=,jobs=,state=free,netload=? 0,gres=,message=ERROR: torque spool filesystem full,loadave=0.00,ncpus=4,physmem=8077312kb,availmem=7872128kb,totmem=8077312kb,idletime=11,nusers=0,nsessions=0,uname=Linux bachianas 2.6.16.60-0.42.10-default #1 SMP Tue Apr 27 05:11:27
UTC 2010 ia64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
There is a message
field with the following content: ERROR: torque spool
filesystem full
. Some time ago the filesystem was really full and we had to
delete some files but we didn't restart pbs_mom
.
So, in this case, a quick restart of pbs_mom
daemon solved the problem.
Reason 2: DNS problems
If we use command checkjob
to investigate, we see::
job is deferred. Reason: RMFailure (cannot start job - RM failure, rc: 15085, msg: 'Time out MSG=connection to mom timed out')
This means that the server cannot connect to the mom daemon and vice-versa.
In mom_logs
we find the line::
11/05/2014 18:30:47;0008;PBS_Server.23876;Job;3080.bachianas.ufabc.edu.br;unable to run job, send to MOM '3364214663' failed
And a call to qrun
command to force the execution of the job returns::
qrun: Time out MSG=connection to mom timed out 3083.bachianas.ufabc.edu.br
There is also various messages in server_logs
saying it is not possible to
communicate to mom.
First, check if both daemons pbs_server
and pbs_mom
are running. If
so, there is likely a problem with DNS. In my case, there was an entry for an
invalid DNS server in /etc/resolv.conf
and it was necessary to remove it.
After that, I had to free jobs from hold with the releasehold
command.