next up previous contents
Next: Comparisons between UNIX and Up: No Title Previous: Locally added software

Running computationally intensive jobs

Batch queues

Batch queues are generally not necessary when using UNIX. This is quite a different situation from the usual mainframe way of doing things.

Suppose that you have a program called model which you want to run ``in the background'' (i.e., have the computer run the program without tying up your terminal), then all you have to do is,

model&                         ! see note below on the "nice" command
The & at the end of the line instructs UNIX to run the program detached from your terminal. In other words the program will start running simultaneously with your login session, and will even continue running after you logout. As a general rule you should reduce the priority of all the background jobs that you start unless you specifically require interactive priority. To run a background job the correct command is then
nice model&
This will run model at a priority that will not affect interactive users. (By the way, the command is called nice since when you use it you are being nice to other users). If you do not nice your background jobs, the operating system detects this fact and will automatically renice them to a lower priority based on the cumulative amount of cpu time that the job has used. It is in your interests to nice the jobs yourself, since the operating system will give preferential treatment to those jobs rather than ones that it has had to renice.

You can have lots of jobs running in the background, however, the operating system will give all but one of them the lowest possible priority. In general, therefore, it is in your interests to run jobs sequentially.

Their is no advantage to you in trying to run a computationally intensive job interactively (i.e., in the foreground). The operating system detects that you are using lots of CPU time and then treats your interactive job just like a batch job. When your job is finished your priority will be very low and you may notice slow response from the computer.

The priority adjustment mechanism is potentially very flexible. It will have to modified in the future as we accumulate experience with the types of jobs that people want to run. If you have specific problems with the way that priorities have been assigned to your jobs, see Michael Ashley. The system has been designed in an attempt to be fair to all users, and attempts to circumvent it will be regarded darkly. In extreme cases your jobs may be terminated without warning. Tricks such as running your jobs under multiple usernames (e.g., a staff member using the account of an Honours student, as well as his/her own), are not in the spirit of fair play.

Once you have placed a job in the background, you can return it to the foreground (i.e., attach it to your interactive terminal session) by using

fg %n ! fg stands for foreground

where the number n was obtained from the list of your background jobs given by the command jobs. The number can be omitted if you want to attach the last job you backgrounded. (Note: jobs only lists those jobs that you have started in the current session, to see all your jobs use the ps -x command).

Sometimes it happens that the program you want to run in the background requires some input from the user before the computations get under way. To do this you start the program in the foreground, enter the data, suspend the program, and then restart it in the background. An example follows:

model                             ! start the program
input number of iterations> 10    ! here is some user input
input value of pi> 3.14159        ! more input
^Z                                ! this is a control-Z it stops model
bg                                ! this puts model in the background
Note that if your batch job sends output to your terminal (stdout) it will do so even if running in the background. If you then logout, the batch job will die since it has nowhere to write its output. The solution to this is to redirect the output to a file, or to tex2html_wrap_inline1317 dev tex2html_wrap_inline1317 null if you want to throw it away, e.g.,
nice model>&\dev\null&
tex2html_wrap_inline1317 dev tex2html_wrap_inline1317 null is a special file which will accept any amount of data, and simply discards it. For more information about foreground and background jobs and setting job priorities, see man csh.

UNIX also provides the at and batch commands for submitting command files for later execution. For example, suppose that you create the following command file called run.model

#!/bin/csh                        ! use csh as the shell
/bin/rm -f fort.1 fort.2          ! delete some files
ln -s mydata fort.1               ! link mydata to fortran unit 1
ln -s modeloutput fort.2          ! and link modeloutput to fortran unit 2
model                             ! start the program, called model as before
exit
Then to submit this job to start at 4am on July 4, you use
                                                               
at 4am jul 4 run.model
To submit the job as soon as possible based on the load of the system, use
                                                               
batch run.model
For more information on at and batch, see man at.

Finally, if you want to stop a job, use one of the following kill commands,

                                                               
kill %n                  ! where n is the job number (from the "jobs" command)
kill n                   ! where n is the process id (from the "ps -x" command)

If you wish to place limits on the cpu time that your program can use, try the limit command (documented in man csh). This is a good idea to stop potential infinite loops.

Writing computationally intensive programs

Since the computer assigns job priorities partly based on the amount of cpu time that a job has used, you will find that if you start using many hours of cpu time, your jobs will get the lowest priority. So for fast turnaround it is essential to spend some time making sure that your program is an efficient as possible. If programming in FORTRAN it is well worth reading a guide on optimizing FORTRAN programs. A common mistake that people make is to choose the wrong order in which to increment array indices. The following code fragment is the wrong way to add two arrays:

                                                               
      do i = 1, 10000
        do j = 1, 5000
           a(i,j) = b(i,j) + c(i,j)
        end do
      end do
It should have been coded like this:
                                                               
      do j = 1, 5000
        do i = 1, 10000
           a(i,j) = b(i,j) + c(i,j)
        end do
      end do
Since FORTRAN assigns arrays in memory with the first index varying most rapidly, the second example results in contiguous areas of memory being addressed whereas the first example jumps all over the place. Depending on the size of the arrays the difference in execution time could be a factor of 1000.

There are numerous other tricks of the trade for speeding up programs. It is a rare program that can not be sped up by factors between two and ten. If your program still requires hours of cpu time then you should do the following:

To maximize the performance of a computationally intensive program you should keep any disk or terminal I/O to an absolute minimum. Any disk activity will result in the program coming to a standstill while it waits for the disk to be available, and the priority of the job will plummet.

Checking the progress of your job

Use ps -x to obtain a list of all the processes that you have running (see man ps for a list of options). Use jobs to find the processes that you have started in your current login session. Use ps -aux to get a complete list of all the processes on the system. Use top to see the fraction of CPU time that your job is using (although please don't leave top running for hours since it uses up CPU time itself). Please kill any jobs that you no longer need.

Programs requiring huge memory allocation

UNIX is a virtual memory operating system, and you can run programs which use up to a maximum of 1 GByte of memory. How efficiently such a program will run depends crucially on how you access the memory. Sequential access is always the fastest since the computer has high speed caches that anticipate you will use memory sequentially. newt has about 168 Mbytes of semiconductor memory (soon to be increased by an additional 128 Mbytes), but you shouldn't expect to be able to run a program efficiently that uses more than about 40 MBytes.

Programs requiring huge time allocation

If you expect that your program will run for more than a day, you must write out intermediate results so that you can continue where you left off if the computer crashes or if it has to halted for administrative reasons. Normally a day's warning will be given of computer downtime, and any batch jobs which are still running at the time of the shutdown will quietly die.


next up previous contents
Next: Comparisons between UNIX and Up: No Title Previous: Locally added software

Michael C. B. Ashley
Fri Jun 28 13:34:23 EST 1996