Chapter 6. Detecting Discrepancies

The next step in using dmaudit is to generate a report that summarizes discrepancies between the file systems and the DMF daemon database. As was mentioned earlier, this is also known as taking a snapshot. The DMF daemon (dmdaemon(8)) must be running before you can attempt this step.

This chapter shows you how to generate the report both interactively and in batch mode. You should read the entire section before generating a report on your own machine so that you can choose which method best fits your needs.

Snapshot Resource Requirements

Taking a snapshot with dmaudit can require a large amount of both wall-clock time and CPU resources, depending on the size and number of files being scanned. During the snapshot, dmaudit must sort and merge several very large data files.

The first time you take a snapshot, you should do it during a time period in which you know dmaudit is able to execute continuously for several hours. Keep track of how long it takes to generate the report so that you can determine how much time to allot in future runs. If the program is stopped for any reason before the report generation is complete, the entire snapshot is discarded and you must start over.

Taking a Snapshot Interactively

This section describes the various options available to you when you take a dmaudit snapshot interactively.

If you just finished the initial configuration of dmaudit, you should already be positioned at the Main menu. If you are not currently running dmaudit, enter the dmaudit command.

If you are running dmaudit but are positioned at some other menu, enter up as many times as necessary until you arrive at the Main menu. Your screen should look like the following:

MAIN MENU
---------

Select:
   <snapshot>  Take a snapshot and report status of file systems and databases
   <config>    Examine or modify configuration information
   <quit>      Quit

Please enter your selection:

The snapshot Option

To generate a report of all discrepancies in your file systems and daemon database, enter snapshot from the Main menu. A display similar to the one shown in the following example eventually appears:

DATA MIGRATION CONFIGURATION
----------------------------
Data migration home directory:          /dmf/home
Data migration binaries directory:      /etc/dmf/dmbase/etc

Server name:                            daemon
Server home directory:                  /dmf/home/daemon
Server spool directory:                 /dmf/spool/daemon
Data migration daemon process ID:       68565

MSP name   MSP type     MSP home directory    MSP spool directory
--------   --------     ------------------    -------------------
red        dmatmsp      /dmf/home/red         /dmf/spool/red
dsk        dmdskmsp                           /dmf/spool/dsk
dlt7000m   dmatmsp      /dmf/home/dlt7000m    /dmf/spool/dlt7000m

LIST OF FILESYSTEMS SCANNED
---------------------------
/admin             xfs
/cloudy/ccn        xfs
/cloudy/mktg       xfs
/cloudy/sdiv/comp  xfs
/cloudy/sdiv/lib   xfs
/cloudy/sdiv/net   xfs
Enter <CR> to continue:

The beginning of the report shows your current DMF daemon and MSP configuration, followed by the list of file systems that were searched for migrated files. The type of each file system is shown next to its path name. If, as in this example, the list is continued on the next screen, press ENTER to view the remainder of the report:

MAIN MENU
---------

Select:
   <inspect>    Inspect and correct file system and database errors
   <report>     Reprint status report for the current snapshot
   <verifymsp>  Check the dmatmsp tape msp databases against the daemon
                databases
   <snapshot>   Take a snapshot and report status of file systems and databases
   <free>       Release all file space used by the current snapshot
   <config>     Examine or modify configuration information
   <quit>       Quit

Please enter your selection:

The second screen shows the remainder of the file system list together with the database error report. In the example, no errors were detected in either the file systems or the daemon database.

After the report, dmaudit automatically returns to the Main menu. The inspect, report, and free options are described in the following sections.

The inspect Option

The inspect option is normally used only in cases in which discrepancies have been detected. It allows you to examine each error in detail and to correct those errors. Even if there are no discrepancies, however, inspect can be used to examine individual bfid sets in the daemon database.

The report Option

The report option allows you to reprint the report from your most recent snapshot. Because all information contained in the report is determined during the snapshot phase, the report option can reissue the report instantaneously. The report looks the same as the one shown in “The snapshot Option”.

This option is useful if you took a snapshot earlier and want to view its report again. It is also handy if you normally take snapshots in batch mode. If a batch run indicates that there are errors, you can then execute dmaudit interactively and read the report directly.

The verifymsp Option

The verifymsp option allows you to have dmaudit run the dmatvfy(8) command using the data captured during the snapshot and adding the information to the dmaudit report. A display similar to the following eventually appears:

Please enter your selection: 
verifymsp
10/10/97 10:06:42 : Extracting dmf records for msps: silo.
10/10/97 10:06:43 : Verifying msp "silo"...

MSP silo ERROR REPORT
---------------------

  Bitfile ID 344789f2000000000000006c - missing from daemon db.
  Bitfile ID 344789f200000000000000d5 - missing from daemon db.
  Bitfile ID 344789f2000000000000028e - missing from daemon db.
3 Errors found in CAT database.
0 Errors found in VOL database.

MAIN MENU
---------  

You can view the output by using the report option, which is described in “The report Option”.

The free Option

After you take a snapshot, a potentially large amount of disk space remains allocated to files in the working directory to contain all the information dmaudit needs to generate its report. This is done so that you can examine the snapshot at your leisure. The disk space normally remains reserved until you take your next snapshot.

If no errors were detected in the snapshot, however, you might want to enter the free option to release that disk space so that it can be used for other purposes until the next time you take a snapshot. The free option discards all snapshot files while retaining all configuration information.

After the snapshot files have been removed, the Main menu once again looks like the following:

MAIN MENU
---------

Select:
   <snapshot>   Take a snapshot and report status of file systems and databases
   <config>     Examine or modify configuration information
   <quit>       Quit

Please enter your selection:

You do not have to manually select free each time before taking a new snapshot. The snapshot option automatically releases all space from a previous snapshot before taking a new one.

Example of a Report with Discrepancies

If discrepancies are detected between the file systems and the daemon database, dmaudit attempts to summarize them in the last section of its report.

The following is an excerpt from a report that detected several kinds of errors:

DAEMON DATABASE ERROR REPORT
----------------------------

There are 14 bitfile IDs in use by more than one file that cannot be
corrected without additional information from you.

There are 3 bitfile IDs in use by more than one file that can be
automatically corrected.

There are 2 user files whose data cannot be recovered.

There are 231 user files that have correctable errors.

There are 18 bitfile IDs in the daemon database for which no user files
can be found.

There are 43 bitfile IDs whose errors cannot be corrected until a new
snapshot is taken.

There are 12 bitfile IDs with files that cannot be recalled or migrated.

There are 9 bitfile IDs with files that are internally inconsistent.

dmaudit summarizes the errors that it finds into eight general classes (described in Table 7-1). dmaudit allows you to examine each of the error classes separately.

Taking a Snapshot in Batch Mode

To avoid tying up your terminal for long periods of time, dmaudit also allows you to specify the snapshot option in batch mode. To take a snapshot in batch mode, specify snapshot as a parameter on the command line. For example, a Bourne shell user might enter the following:

nohup dmaudit snapshot >rpt 2>err &

(The nohup(1) command allows dmaudit to continue processing even if you log out of the system.) When the snapshot completes, the report is written to standard output, in this case to the file rpt. If dmaudit encounters fatal errors, it issues messages to standard error (in the example, this is the file err).

The exit status from a snapshot run in batch mode can be used in shell scripts to determine whether errors were detected, as follows:

  • An exit status of 0 indicates that dmaudit completed the snapshot and that no file system or database errors were detected.

  • An exit status of 1 indicates that file system or database errors were discovered by dmaudit.

  • Any other exit status indicates that dmaudit aborted with a fatal error.

The report and free options may also be used in batch mode. The following is an example cron(blank) script that could be used to take snapshots on a periodic basis:

#!/bin/sh
#       script to take a snapshot and report errors by mail
#

TMPDIR=/TMP
MAILLIST="root"

dmaudit snapshot 2>$TMPDIR/errmsg 1>/dev/null
STATUS=$?

if [ $STATUS -eq 0 ]
then
        dmaudit free        # remove unneeded files
        exit 0
fi

if [ $STATUS -eq 1 ]
then
        dmaudit report | mailx -s "dmaudit error report" $MAILLIST
else
        cat $TMPDIR/errmsg | mailx -s "dmaudit failed with this msg" $MAILLIST
fi
exit 1

If the snapshot completes successfully with no errors detected, the script uses the free option to release all working directory space. If errors are detected or if dmaudit fails, mail is instead sent to the administrator.