LA-CC 03-070, C-03,066
PROCMON - Process Monitor
Source
Source
Documentation
Documentation
Executive Summary
This software provides a relatively(?) easy way to get process
information (currently memory and CPU usage) that is
independent(?) of UNIX the architecture.
Information is printed much like that for a ps or top command (eg process size
and overall/instantaneous CPU usage is printed at a specified time interval).
Manual instrumentation can also be done to get information about specific code
blocks. This is done by adding "start" and "stop" calls (with a tag name)
in the source code (C/C++ or Fortran).
The tool procmon_post.pl can be used to process this raw data output and
create graphs/charts.
Output File Formats/Discussion
Each of the example runs below has a link to a directory of output files.
- procmon_MACHINE_PID.txt files - Created by PROCMON sampling
These are the raw output files generated by PROCMON.
Each process creates 1 file (with machine name and PID forming the filename).
These files are used as input to procmon_post.pl to create graphs/tables.
A comment block at the top of the file contains a brief description of
the columns.
- procmon_MACHINE_PID.txt.dat files, procmon.cmd - Created by procmon_post.pl
The .dat files contain the data that gnuplot will use to create the
graphs. They are bassically massaged data from the .txt files. Data
from all samplings is listed first followed by information about each
block.
The actual gnuplot commands are stored in procmon.cmd.
- procmon.out: - Created by procmon_post.pl
Text results printed to the screen after running procmon_post.pl.
Information about each block is printed.
- Delta Mem - Change in memory (with average and max per block)
- Delta Time - Time spent in code block (again with average and max)
- # Blocks - Number of times that block was seen
- Block - the block name
- .pdf/.ps graphs: - Created by procmon_post.pl
- Page 1
- Overall CPU usage ((user+system)/wall time for the life of process)
- Instaneous CPU Usage (usage since the last sampling)
- Percent free memory on the machine
- Process size
- Pages 2-4
These pages have information about other stats (like stack size and page
faults).
- Instrumentation Pages
You can manually instrument code blocks by placing "start"
and "stop" calls (along with a tag name) around sections of code.
PROCMON will get process information (eg delta time and process size
increase) about these blocks.
Each page gives information about each code block.
The final Instrumentation Page is named Final Res - Final Results. This
is the code block bounded by the first and last sampling.
Example Runs
- Parallel Leak
Directory: procmon_parallel_leak
This is a 4 Process MPI run where all processes loop through allocating space
and all processes except process 0 free their memory (process 0 has a leak).
Note how when the
machine memory is exhausted, the cpu usage becomes erratic. The executable
continued to run until the memory allocation failed.
On some systems, you can exhaust memory (with terrible performance) before
you run out of memory (the point at which memory allocation fails).
The trick is to stop your code at the point where memory is exhausted
and terminate gracefully. PROCMON can be called from within C/Fortran
code to determine this point. This run shows that memory was exhausted
at 20% machine free memory, and memory ran out at 10% machine free memory.
- Serial IO
Directory: procmon_io
Serial run where processes loop through cycles of computing and doing IO
(writing to a file). While doing IO, you can see the CPU usage drop.
- 4 Methods
Directory: procmon_methods
4 different serial runs of code that loops over allocating memory.
The 4 lines per plot represent the 4 different methods in which PROCMON
can be used to get process information.
- Command Line Tool (like top)
- Run Time Linker Environment Variable
- Re-Link of the Executable
- Manual Instrumentation via libprocmon_info.a