LA-CC 03-070
Executive Summary
PROCMON is a tool that allows users to gather run-time information about
their program (eg memory use vs time). No recompiling or relinking is required.
The users simply sets environment variables and runs their executable as
they normally would.
The file procmon.pdf is an output
file from a 4 process mpi executable. One of the processes has a
leak.
Limitations and Compatibilities
-
Overhead
Information is gathered by periodically sending signals to the process
and using a signal catcher which gathers information about the process.
If the sampling frequency is too high, the process will not get any work
done. Experiments show that setting this frequency to less than one per
second is intrusive.
-
Results
Output is one text file per process. Each sampling output is on the
order of 50 lines. The sampling frequency setting for longer runs can still
be high but the environment variable PROCMON_INTERVAL should be used to
limit the number of samples printed to file.
-
Child Processes
All child processes will be monitored. The environment variable PROCMON_MIN
can be set so that only those processes that have a minimum number of samples
will have final output.
How to Use
There are two sets of environment variables that can be set:
-
Location of PROCMON shared object library
The setting of this variable is different on different machines. This
variable tells the operating system to load the PROCMON library at runtime.
Here are the settings for the following architectures:
-
OSF
setenv _RLD_LIST < full path >/libprocmon.so:DEFAULT
-
SGI (64 bit execs)
setenv _RLD_LIST < full path >/libprocmon.so:/usr/lib64/libmalloc.so:DEFAULT
-
LINUX
setenv LD_PRELOAD < full path >/libprocmon.so
-
Settings for PROCMON
-
setenv PROCMON_DIR < dir >
Create output files in < dir >. The form of the file is: < dir
>/procmon_< machine name >_< pid >.txt The default is to create files
in the current working directory. The program will still continue even
if files cannot be created.
-
setenv PROCMON_INTERVAL < interval >
Print sampling output every < interval > number of samples. Due
to the possible large size of the output file, you may wish to sample every
500 milliseconds, but only print out sampling info every minute. In this
case, you would set < interval > to be 60*(1/.5)=120. If you wish to
print out every sample, set < interval > to 1. The default is to only
print out the final summary (< interval > = 0).
-
setenv PROCMON_MIN < min samples needed to keep output file >
While a process is running, it might create other subprocesses that
will also be sampled. During the course of a run, this could create _many_
output files. Files that have less than this number of total samples will
be deleted. The default value will be to keep all files.
-
setenv PROCMON_TIME < number of milliseconds >
Sets the amount of time between sampling. This is done by calling setitimer().
Do not set this too low or else your process will spend all of its time
sampling and not making any progress. The default is to do sampling at
the beginning and end of the process.
PROCMON is installed on the main open compute platforms (OSF:qsc, SGI:theta/bluemountain,
and Linux:lambda) at:
/usr/projects/ccn8_repo/installed/PROCMON
Complete Documentation
Documentation can be found at:
/usr/projects/ccn8_repo/installed/PROCMON/< version >/< arch >/doc