CONDOR at the AIfA

CONDOR basic information

Would you like to be able to run CPU intensive jobs on multiple machines? This page can help you with some useful information on why and how to use CONDOR.

First things first, what is CONDOR? Taken from its homepage, the definition is:

HTCondor is a specialized workload management system for compute-intensive jobs. HTCondor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to HTCondor, HTCondor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. 

In simple terms, CONDOR can be used on a set of computers, such as the one in our institute, which belong to regular users. Since the CPU usage for individual machines does not typically exceed 10%, there is a lot of wasted computing power. CONDOR can make use of that, having a user submit a job to the computer pool, and letting CONDOR run it on available machines.

The homepage of the CONDOR project is: http://research.cs.wisc.edu/htcondor/

It was developed at the Center For High Throughput Computing at UW-Madison. 


For those who want to join in the CONDOR network, please contact the computer group.

For other questions, regarding how condor works, how to submit jobs to the pool and how would condor affect your machine's performance, please see the links included on these pages.

PROS:

  • can run many simultaneous jobs, good for simulations which need to run multiple times with different parameters
  • requires no knowledge of other programming languages, condor being able to run a multitude of codes
  • can checkpoint a job that was interrupted and pick up where it left off
  • ideal with processor intensive jobs
  • no computer is made unusable for users while jobs are being run in the pool

CONS:

  • jobs have to be able to run without user input, once a job is submitted it shouldn't require to wait for user input 
  • not ideal with jobs that output lots of data to the disk (gigabytes for ex.)