FILERPT 2.94
by
SRS & SMC
Allegro Consultants, Inc.
Inspired by a program from
Chuck Storla of HP, 1980
OVERVIEW
========
Disc I/O on the HP3000 is affected by many things: blocking factors,
the number of buffers available, the type of file (variable or fixed),
the spread of the disc files across the available devices, the type of
calls made against the file as well as the sequence of those calls and
their frequency. For an individual file, all of these considerations
must be taken into account when we attempt to optimize our system. As
well, we must consider the use of each file as it pertains to the
application and if it is a structured file, we must analyze our
selection of keys and file relationships.
The benefits involved in handling these tradeoffs correctly are many,
as are the costs involved with incorrect file choices. For example, in
a file which will be accessed sequentially, the blocking factor
determines how many physical I/O's are necessary to process a given
number of logical records. If a file contains 10,000 logical records
which must be sequentially read by a program then the formula for the
number of physical reads necessary is as follows:
physical'reads = [ logical'records / blocking'factor ]
where the expression [ ] is rounded up to the nearest integer
Thus, in our ten thousand record file, a blocking factor of two results
in 10000/2, or 5, 000 physical reads against the file. A blocking
factor of twenty results in 10000/20, or 500 reads and an improvement
factor of ten in the number of physical transfers necessary. Logical
record accesses cost in terms of cpu as well as the amount of memory
required during the transfer. Physical accesses cost in terms of
several additional resources - the disc movement required, the
controller time, channel time, etc. Assuming our application does
require the logical accesses, we can not reduce them (although NOBUF
may reduce their impact).
However, by varying the blocking factor, we can reduce the number of
physical accesses required (at a cost of increasing the amount of
memory our buffers are using). The end result of this change can be
significant. One test showed that altering a blocking factor from two
to twenty , resulted in a thirty percent decrease in the execution time
of a sequential-read application. Similar optimizations are possible by
varying some of the other file characteristics or by rewriting portions
of the file handling applications.
Such optimizations can provide significant improvements in the
performance of a system. However, each such optimization has an
implicit cost associated with it. It may be simply the time the
programmer must take to redo the file equations or :BUILD commands of a
batch job. On the other hand, the optimization may involve extensive
recoding effort as well as testing. Most optimizations involve
exploiting relatively harmless tradeoffs. In the case of the blocking
factor change mentioned above, the decrease in the number of physical
I/O's more than compensated for the additional burden the larger
buffers placed on memory. This is not always the situation though, and
each modification must be tested to determine if it does indeed improve
or degrade performance.
Since these file optimizations have costs associated with them, we
would like to pay the cost only when we are sure of a chance of a
reasonable return on our investment. There is a theory of programming
which says that eighty percent of the time a program is executing it is
in twenty percent of the code. This 80/20 rule can also be applied to
files, or "eighty percent of all file activity is against twenty
percent of the files." To receive a maximum return on our optimization
investment, we should obviously focus our attention on this "top
twenty" percent, and the topic which this paper addresses is the
identification of that twenty percent.
METHOD
======
This top 20 might also be referred to as the "busy" or "heavily used"
files on the system. Before we can identify the top 20, we must
establish our criteria for deciding which files are busier than others.
Three criteria come immediately to mind when we speak of heavy file
usage:
1) the number of logical accesses to the file,
2) the number of physical accesses,
and 3) the number of times a file is opened.
This information (with a few exceptions noted later) is available to us
through the type 5 record maintained by MPE in its log files. The
format of the type 5 (FCLOSE) record is given on pages 6-123 and 6-124
of the old SYSTEM MANAGER/SYSTEM SUPERVISOR Reference Manual. The
information which will be of interest to us includes:
file name
logical device # of file label & first extent
number of records processed
number of blocks processed
where:
file name
The fully qualified formal designator associated with the file,
fname.group.account; some program temporary files may have a
blank name.
logical device
The logical device number of the file label. This ldev may not
contain the entire file since only a single extent need reside
entirely on one device.
number of records
The number of logical records which have been read or written
since the file was opened, this value gives us a measure of the
application's activity.
number of blocks
The number of blocks which have been transferred to/from the
file. This value is a measure of the physical I/O against the
file in all instances except the following two cases:
(a) a rewind (FCONTROL (5)) against a variable record file
resets this value to zero;
(b) for files which are accessed with MULTI-REC (bit 11 of
AOPTIONS or MR in file eq.) and where the block size is
equal to an integer multiple of 128 words. In this case
the value is the number of blocks processed (which will
probably be greater than the number of physical I/O's).
The information contained in these log records is synthesized into one
of several reports which help users determine what their busy files
are. The specific algorithm used will be covered later, but briefly our
method is to gather all of the type 5 records for each unique file on
the system, and total the number of records processed, the number of
blocks processed as well as the count of type 5 records encountered.
This gives us a measure of:
the application activity - number of records processed,
the physical I/O - number of blocks processed (refer to note above)
the number of FOPEN's - type 5 record count.
Each of these measures gives us a slightly different measure of the
relative use of the files, so our method allows us to choose the top
twenty percent of the files judged to be busiest on any of these
values. For that matter, we may choose any percentage from one to one
hundred. Once these busy files have been identified, we can begin to
optimize these files knowing that any improvements we make will have
the maximum effect on the overall system performance.
ADVANTAGES / DISADVANTAGES
==========================
There are several other methods which could have been used in an
attempt to determine which files are our busiest. These include direct
monitoring of the I/O on the system, embedded measuring tools and
alternative reporting schemes using log records. The method used here
has the following advantages:
- the MPE logging facility is universal across the 3000 product line
and does not require a specific machine or MPE level
- no special programming or capabilities are required
- since the method analyzes log files which have been closed, it may be
run during "off hours" or on a separate machine, thus there is no
effect on the system caused by the tool itself (other than the
negligible overhead of enabling logging itself)
- we have three distinct measures of usage: logical & "physical" I/O
and opens
- this method is not restricted to a particular file type or structure,
although its usefulness may not be as great with some
- the only necessary modification to the system configuration is that
logging be enabled for FCLOSE's (type 5)
The disadvantages to this approach are as follows:
- although it does give you physical I/O, these figures are not related
to any figure such as specific IMAGE calls, etc.
- this tool works well for summary reporting of disc activity, however,
in some cases the real concern may center on "burst" I/O activity;
i.e., the total number of I/O's spread across the day is small, but
within 2 seconds after ENTER is hit, the activity is concentrated
- there is no mechanism for determining the I/O rate for a period of
time
- these records can not be used to summarize disc activity per device
since the LDEV is only for the first extent; not all extents of a
file are required to reside on a single device
- the number of blocks processed does not equal the number of physical
I/O's if one of two cases is true:
1) a rewind on a variable record file sets the number to zero
2) files opened with multi-rec which also have blocks which end on
a sector boundary may access multiple blocks in a single I/O,
the number is higher than the actual number of physical I/O's
COMMANDS
========
The program, FILERPT, has several commands which when executed in a
particular sequence will produce a summary file which can then be
re-sorted and listed for several different reports. The commands and
the functions they perform are listed here.
1. CREATE
---------
CREATE produces a summary file of the FCLOSE records from one or more
log files. The log file (s) used are read sequentially and all type 5
(FCLOSE) records for disc files (subtype 0) are extracted. These
records are then sorted by the file formal designator
(file.group.account) to group all records for the same file. EDITOR
work files of the form Knnnnnnn, where nnnnnnn is a valid seven-digit
number, are transformed to have file names of the form, K#######, to
group all k-files for each group/account into one record. This
temporary sort file is then read sequentially and a summary file is
built containing one record for each unique formal designator. This
record contains the device number, total number of records processed,
total number of blocks processed, FCLOSE count and an indicator for
whether the device number was the same for all FCLOSE's. If this
indicator is set to TRUE, then at there was at least one record which
contained a logical device different from the other records for that
file. This indicates that the file has moved, possibly due to a purge
and re-create.
Since the log files may have been stored from the test system and
restored to an account or group other than PUB.SYS, the CREATE command
allows the user to override the default group and account (PUB.SYS) for
the log file (s) to be analyzed. Once the group and account has been
established, the four digit number of the first log file is entered and
then the four digit number of the ending log file is entered if
different from the first. Once these numbers are in, FILERPT requests
the name of a summary file which it will attempt to create to hold the
summary records for each FCLOSE'd file.
After the log file range has been specified, FILERPT reads through each
log file whose number is in the desired range and extracts all the file
close records, storing them in the summary file.
Note: in an effort to let you know that FILERPT is alive and well, it
will print a dot (.) every 1000 records read from a log file.
Additionally, it will print an asterisk (*) for every 1000 file close
records found.
2. LIST
-------
The LIST command will sort and report on the records found in a summary
file, whose name can be directly entered or a return can be used to
indicate that the same summary file will be used again. The sort of
this command determines whether the report produced will show the
"busy" files based on number of records processed, blocks processed or
the number of FCLOSE's. One of these three measures is used as the key
in a descending sort. When this sorted file is then listed and
totalled, the files are in an order such that the "busiest" files are
listed first. The user can thus choose to list just the busiest ten
percent of the system's files. If the sort key chosen stays the same
between two LISTings of the same summary file, then the sort is not
executed to save time.
3. LP
-----
The LP command opens a file with the name, "PRINTER", which defaults to
the line printer. All reports are sent to this file until the TERM or
EXIT commands are used. A file equation may be used to redirect this
file.
4. TERMinal
-----------
The TERM command closes the previous PRINTER file and directs the
output from subsequent LIST commands to $STDLIST. This is the default
case when the program is first run.
5. EXIT
-------
The EXIT command closes all open files and ends the program execution.
A :EOD entered at any prompt will also terminate FILERPT.
6. SET / RESET
--------------
Syntax:
SET/RESET