Difference between revisions of "HPC - High Performance Computing"

From PHYSpedia
Jump to: navigation, search
(Created page with "=Using a Cluster=")
 
(Using a Cluster)
Line 1: Line 1:
=Using a Cluster=
+
=Using a Scheduler=
 +
==Motivation==
 +
When developing code, it is usually sufficient to test that your code is working by running "small" configurations one at a time and checking the output. Once a code has been shown to work, we typically want to run several "large" configurations. From the terminal, this consists of configuring the code an running the executable. After the first configuration finishes, we run the executable with a different configuration. Once this finishes, we do the same until we have run all configurations. The problem here is that we are only running one instance of our code at a time, and we have to manually start the next configuration after the last completes. This means that if the last configuration completes at 2:00 AM, the next configuration will not be started until we check on the progress later in the mourning.
 +
 
 +
We could open multiple terminals and run multiple instances of the executable for different configurations. We would need to be careful not to run too many instances simultaneously though, because we could end up requiring more resources than are available. For example, if running 1 instance of our code requires 1 GB of RAM, then running 5 instances will required 5 GB. If our computer only has 4 GB of RAM, running 5 instances of our code will fail. What we would like is a way to automatically run multiple jobs, one after the other, and run multiple instances of our code at once without running too many. Enter the scheduler.
 +
 
 +
A scheduler is a system that manages the running of simulations. With a scheduler, we can just tell the scheduler what we want to run (for example, 10 different configurations of the same code), and the scheduler will do the rest. The scheduler will manage our simulations along with any other simulations that other users would like to run, and running as many at once as possible and starting new instances when old instances finish. All large scale high performance computer clusters used for numerical simulation use some sort of a scheduler. Therefore, in order to use these clusters, you must be able to use a scheduler.
 +
 
 +
==Terminology==
 +
 
 +
A user "submits" "jobs" to the scheduler.

Revision as of 10:17, 5 April 2012

Using a Scheduler

Motivation

When developing code, it is usually sufficient to test that your code is working by running "small" configurations one at a time and checking the output. Once a code has been shown to work, we typically want to run several "large" configurations. From the terminal, this consists of configuring the code an running the executable. After the first configuration finishes, we run the executable with a different configuration. Once this finishes, we do the same until we have run all configurations. The problem here is that we are only running one instance of our code at a time, and we have to manually start the next configuration after the last completes. This means that if the last configuration completes at 2:00 AM, the next configuration will not be started until we check on the progress later in the mourning.

We could open multiple terminals and run multiple instances of the executable for different configurations. We would need to be careful not to run too many instances simultaneously though, because we could end up requiring more resources than are available. For example, if running 1 instance of our code requires 1 GB of RAM, then running 5 instances will required 5 GB. If our computer only has 4 GB of RAM, running 5 instances of our code will fail. What we would like is a way to automatically run multiple jobs, one after the other, and run multiple instances of our code at once without running too many. Enter the scheduler.

A scheduler is a system that manages the running of simulations. With a scheduler, we can just tell the scheduler what we want to run (for example, 10 different configurations of the same code), and the scheduler will do the rest. The scheduler will manage our simulations along with any other simulations that other users would like to run, and running as many at once as possible and starting new instances when old instances finish. All large scale high performance computer clusters used for numerical simulation use some sort of a scheduler. Therefore, in order to use these clusters, you must be able to use a scheduler.

Terminology

A user "submits" "jobs" to the scheduler.