home | codereading | contact | math | misc | patches | tech
TORQUE is a resource manager to provide to processes, a organized way to access a system's resource. In conjunction with Maui, which is a scheduler for the queues, it are the main tools installed in a HPC system. This page will show you basics about TORQUE installation and configuration.
There are some proprietary alternatives like PBS Professional, that joins both a resource manager and a scheduler and Moab from Adaptive Computing, which is another HPC suite. Those are not covered here.
Contents
You can get an overview of TORQUE in the TORQUE architecture page of the TORQUE Administration Guide, but in summary it is as follows:
For a successful TORQUE installation, four daemons should be running:
The following diagram is a summary of the communication between daemons and user programs:
master node ........................................ : : : +--------------------+ : : | user commands | : : | (qsub, qdel, etc.) | : : +--------------------+ : : ^ : : | : : v : : +----------+ : : | trqauthd | : : +----------+ : : ^ : : | : : v : : +------------+ +-----------+ : : | pbs_server |<->| pbs_sched | : : +------------+ +-----------+ : : ^ : : | : :......................................: | | slave nodes | .............................. : | : : v : : +------------+ : : | pbs_mom | : : +------------+ : : : :............................:
Here, "master node" is the computer that has TORQUE administraton tools installed and that makes all the work of enqueing jobs, running the scheduler, allocation of resources, etc. Normally it is not used for processing jobs. "slave nodes" are the nodes used to run processing jobs. Each "slave node" runs a instance of pbs_mom. Both kind of nodes can be ran in one computer (so, therefore running all daemons in just one computer). See Installation in a supercomputer for details.
At the time of this writing, TORQUE 4.2.6 was the newest version. So let's work with that. I'm working on installing it not on a traditional Beowulf Cluster, but in a supercomputer with 136 processors (we will use just 134) and more than 260 GB RAM.
After downloading and unpacking the tarball, let's configure the system. As the root user do:
# export TORQUE_HOME=/opt/torque-4.2.6 # ./configure --prefix=$TORQUE_HOME # make # make install
We should not forget init scripts. Since our system is SUSE Linux Enterprise Server the commands are:
# cp contrib/init.d/suse.trqauthd /etc/init.d/trqauthd # chkconfig -add trqauthd # service trqauthd start
I prefer not to install it among other things in /usr, so I used the --prefix thing above. We will call that directory with $TORQUE_HOME.
After that, it is important to tell our system where we just installed TORQUE, if it is not a standard location:
# export PATH=$PATH:$TORQUE_HOME/bin:$TORQUE_HOME/sbin
It is a good idea to add it to your rc scripts.
Than, let's create the server database with:
# pbs_server -t create
Note
In the Installing TORQUE Section of the TORQUE Administration Guide, it tell us to use the ./torque.setup root script, that already creates a basic setup for qmgr for us, but we got a problem (see Error qmgr obj= svr=default: Bad ACL entry in host list MSG=First bad host above).
After that, we should start pbs_server, and call the qmgr program to setup the queues:
# qterm # pbs_server # qmgr
In the qmgr console, let's configure a basic queue called batch with the following commands:
create queue batch set queue batch queue_type = Execution set queue batch resources_max.mem = 100gb set queue batch resources_max.procs = 100 set queue batch resources_max.walltime = 01:00:00 set queue batch enabled = True set queue batch started = True set server scheduling = True set server managers = root@hostname set server default_queue = batch set server log_events = 511 set server mail_from = root set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6
We already have a queue setup. Now we need to tell TORQUE to use the computer we are installed. Configuration and daemons for ordinary nodes are different for the master node but, since we are using a standalone supercomputer, we are running it both as the server and the client.
Configuration that should be in nodes are $TORQUE_HOME/mom_priv/config. It should just be:
$pbs_server hostname
Note
Avoid using localhost as the hostname. Use the output of the hostname command. I've had some problems with that (see the Troubleshooting section) which is probably misconfiguration I didn't realize.
And, in the server side, let's just specify the nodes in the $TORQUE_HOME/server_priv/nodes file. In our case:
hostname np=134
Where "hostname" is the output of the hostname command. See we specify the number of processors and can also specify other settings.
After that, let's restart the daemons and also start client-side daemons:
# pkill pbs_server # qterm # pbs_server # pbs_mom
Let's see the output of pbsnodes:
# pbsnodes hostname state = job-exclusive np = 134 ntype = cluster jobs = 0/12.hostname.domain status = rectime=1390849225,varattr=,jobs=12.hostname.domain,state=free,netload=135590624,gres=pbs_server:= hostname,loadave=16.08,ncpus=134,physmem=269558192kb,availmem=273994304kb,totmem=280048592kb,idletime=91,nusers=1,nsessions=4,sessions=147029 163085 164261 173479,uname=Linux hostname 2.6.16.60-0.42.10-default #1 SMP Tue Apr 27 05:11:27 UTC 2010 ia64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003
And let's see if we see our queues:
# qstat -q server: hostname Queue Memory CPU Time Walltime Node Run Que Lm State ---------------- ------ -------- -------- ---- --- --- -- ----- batch 100gb -- -- -- 1 0 -- E R ----- ----- 1 0
In this example we already submitted a job, which is running.
This basic installation works fine for one queue, but normally TORQUE users use it on a cluster with many nodes. The jobs must be scheduled so no user get's more priority than others. Another different and more complex setup, is the one that follows (the output of qmgr -c 'p s':
create queue routing set queue routing queue_type = Route set queue routing route_destinations = P16 set queue routing route_destinations += P8 set queue routing route_destinations += P4 set queue routing route_destinations += P1-2 set queue routing route_destinations += serial set queue routing enabled = True set queue routing started = True create queue serial set queue serial queue_type = Execution set queue serial resources_max.mem = 20gb set queue serial resources_max.ncpus = 1 set queue serial resources_max.nodes = 1 set queue serial resources_max.procs = 1 set queue serial resources_max.walltime = 720:00:00 set queue serial resources_min.procs = 1 set queue serial max_user_run = 4 set queue serial enabled = True set queue serial started = True create queue P1-2 set queue P1-2 queue_type = Execution set queue P1-2 resources_max.mem = 20gb set queue P1-2 resources_max.procs = 2 set queue P1-2 resources_max.walltime = 24:00:00 set queue P1-2 resources_min.procs = 1 set queue P1-2 enabled = True set queue P1-2 started = True create queue P4 set queue P4 queue_type = Execution set queue P4 resources_max.mem = 12gb set queue P4 resources_max.procs = 4 set queue P4 resources_max.walltime = 24:00:00 set queue P4 enabled = True set queue P4 started = True create queue P8 set queue P8 queue_type = Execution set queue P8 resources_max.mem = 24gb set queue P8 resources_max.procs = 8 set queue P8 resources_max.walltime = 24:00:00 set queue P8 max_user_run = 8 set queue P8 enabled = True set queue P8 started = True create queue P16 set queue P16 queue_type = Execution set queue P16 resources_max.mem = 24gb set queue P16 resources_max.procs = 16 set queue P16 resources_max.walltime = 24:00:00 set queue P16 max_user_run = 1 set queue P16 enabled = True set queue P16 started = True set server scheduling = True (... rest of server configuration)
We now have a more complex setup, with different queues that have different attributes (number of processors, walltime, memory available etc.). More information about the attributes in the TORQUE Administration Guide. We also have a "routing" queue to route jobs to the right queues. TORQUE alone (with its very simple scheduler, pbs_sched cannot do that for us.
In that case, we need to use another scheduler. We are going to use Maui. Take a look at our page about Maui for maui installation and setup with TORQUE.
For some reason that I don't know why (Google didn't help) I got this error when running ./torque.setup root command as recommended by the TORQUE Administration Guide. So I ran pbs_server -t create and configured the queues manually.
If the output of pbsnodes is:
# pbsnodes localhost state = down np = 134 ntype = cluster mom_service_port = 15002 mom_manager_port = 15003
Check the content of the $TORQUE_HOME/server_priv/nodes and $TORQUE_HOME/mom_priv/config files, as well the hostname of the host you are running TORQUE server and clients.
There can be different reasons for this problem, like a misconfigured scheduler or queue. In our case, we were trying to configure TORQUE in a supercomputer environment, with lots of CPUs and memory.
We know that in a cluster environment, pbs_server is executed in a "master node" and pbs_mom on the others. A supercomputer environment is a single computer that both runs pbs_server and pbs_mom. There is no problem for that. When we execute pbsnodes command we will see one single node with lots of processors:
# pbsnodes localhost state = down np = 134 ntype = cluster mom_service_port = 15002 mom_manager_port = 15003
After running the job, it just put the whole computer in a job-exclusive state, preventing other jobs to run:
# pbsnodes bachianas state = job-exclusive np = 134 ntype = cluster jobs = 0/36.bachianas status = rectime=1391694740,varattr=,jobs=36.bachianas,state=free,netload=17322218165,gres=,loadave=6.02,ncpus=136,physmem=269558192kb,availmem=271858224kb,totmem=280048592kb,idletime=420,nusers=1,nsessions=4,sessions=50921 55356 55540 192338,uname=Linux bachianas 2.6.16.60-0.42.10-default #1 SMP Tue Apr 27 05:11:27 UTC 2010 ia64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003
We should not treat this machine as a cluster. TORQUE sees the whole machine as a single node and locks it for one job, doesn't matter if the job uses 1 or 134 CPUs.
In this very specific case, the machine supported the NUMA architecture, so we can compile TORQUE with NUMA support to divide the CPUs in logical units. Check the TORQUE on NUMA systems section of the TORQUE Administration Guide for more information.
After configuring the
This error can have different causes. One is that your scheduler is probably not running or cannot communicate with pbs_server. Log of my maui setup showed something like:
02/14 12:00:13 MRMClusterQuery() 02/14 12:00:13 WARNING: no resources detected 02/14 12:00:13 MRMWorkloadQuery() 02/14 12:00:13 WARNING: no workload detected
In my case it was a problem with Maui. This was a problem with the RMCFG[HOSTNAME] setting. See our page about Maui for more details.
Job failing in the wrong queue can have several reasons. The most common reason, IMO, is wrong queue configuration, not a problem with the job itself. But another common reason is wrong PBS directives. The following directive:
#PBS nodes=2:ppn=4
Is wrong. The right one would be:
#PBS -l nodes=2:ppn=4
See -l argument? That is the right way to use it.
After queueing a job, I executed qrun(8) on it, because Maui's scheduler was stopped, for testing purposes. After running qrun, it changed to the R state in qstat(1B), but the column Time Use didn't change. It was 00:00:00.
07/02/2014 15:53:53;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::is_request, bad attempt to connect from 200.XXX.XXX.XXX:1023 (address not trusted - check entry in server_priv/nodes)
Solution 1: In one case, the origin of this problem was that in $TORQUE_HOME/server_name it was correctly the hostname of the machine, but in /etc/hosts the hostname was associated with a external IP (200.XXX...) that nodes could not access. The solution was to change the IP entry in /etc/hosts to the internal IP address other nodes can access.
Solution 2: In another case, pbs_mom(8) was just having problems to execute in some nodes. Why? The log directory (usually /var) was full. I needed to delete some stuff.
This error can be several reasons. It is better to check the logs in $TORQUE_HOME/server_logs.
There is an error like:
PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id), aux=0, ty pe=LocateJob, from user@node
where user is the user login name and node is the name of the node.
Server is OK, scheduler is OK too. It is probably some problem in the node. After logging in it, I realized that users had problems with the NFS partitions, since we made changes to the NFS server and the local firewall. umount and mount them again was not enough, so we had to reboot all nodes. So, it worked out.
This is a very common problem that can have many different reasons.
A problem I had was the following: there was some jobs running on the system, but newer jobs wasn't running. If we just check its status with Maui's command checkjob, it tell us:
# checkjob 3038 (...) job is deferred. Reason: NoResources (exceeds available partition procs) Holds: Batch Defer (hold reason: NoResources) PE: 16.00 StartPriority: 31 cannot select job 3038 for partition DEFAULT (job hold active)
So we have a Maui hold in it. If we just try to relase it with releasehold and wait the scheduler cycle, we see that it is put the hold again. Maui log doesn't help and just tell us that there aren't available resources, like checkjob just did.
pbsnodes tell us jobs are free. But, if you investigate with care, you will see a different message:
bachianas-1 state = free np = 4 ntype = cluster status = rectime=1413550150,varattr=,jobs=,state=free,netload=? 0,gres=,message=ERROR: torque spool filesystem full,loadave=0.00,ncpus=4,physmem=8077312kb,availmem=7872128kb,totmem=8077312kb,idletime=11,nusers=0,nsessions=0,uname=Linux bachianas 2.6.16.60-0.42.10-default #1 SMP Tue Apr 27 05:11:27 UTC 2010 ia64,opsys=linux mom_service_port = 15002 mom_manager_port = 15003
There is a message field with the following content: ERROR: torque spool filesystem full. Some time ago the filesystem was really full and we had to delete some files but we didn't restart pbs_mom.
So, in this case, a quick restart of pbs_mom daemon solved the problem.
If we use command checkjob to investigate, we see:
job is deferred. Reason: RMFailure (cannot start job - RM failure, rc: 15085, msg: 'Time out MSG=connection to mom timed out')
This means that the server cannot connect to the mom daemon and vice-versa.
In mom_logs we find the line:
11/05/2014 18:30:47;0008;PBS_Server.23876;Job;3080.bachianas.ufabc.edu.br;unable to run job, send to MOM '3364214663' failed
And a call to qrun command to force the execution of the job returns:
qrun: Time out MSG=connection to mom timed out 3083.bachianas.ufabc.edu.br
There is also various messages in server_logs saying it is not possible to communicate to mom.
First, check if both daemons pbs_server and pbs_mom are running. If so, there is likely a problem with DNS. In my case, there was an entry for an invalid DNS server in /etc/resolv.conf and it was necessary to remove it. After that, I had to free jobs from hold with the releasehold command.