This manual tells you how to use the dmaudit (8) command to detect and report every known type of discrepancy between your file systems and the DMF daemon database, including the following:
Migrated files for which there are no database entries
Database entries for which there are no migrated files
Duplicate file bit file identifiers (bfids) in either the file systems or the database (the bfid is the ID assigned to each file during the migration process; it links a migrated file to its data on alternate media)
The dmaudit command is intended primarily for interactive use. It uses a series of scrolling menus to display information and to solicit option selections. However, after you complete the initial configuration, dmaudit can also accept the snapshot and report operations from the command line. This allows dmaudit to be run as a background process or in batch mode.
Some of the advantages of using dmaudit are as follows:
dmaudit executes while DMF is active. Users can continue to access their migrated files while you simultaneously search for and correct errors.
dmaudit is accurate. dmaudit automatically adjusts for any user activity occurring in file systems it searches. It therefore is not confused into reporting false errors, nor does it miss errors.
Discrepancies can be examined interactively. dmaudit performs most of its analysis in batch mode and saves the output in indexed files. You can then interactively examine any discrepancy quickly and easily.
You can decide which errors you want to fix, and you have some control over how a discrepancy is to be fixed.
You do not need to immediately fix discrepancies. Because dmaudit saves all information it needs to fix each error, the errors can be fixed hours or days after they are detected.
|Note:: The dmaudit command is capable of showing you what discrepancies exist, but is not able to tell you why they happened. Discovering why an error occurred requires some detective work and a considerable knowledge of the internal workings of DMF.|
If you do want to determine why a discrepancy occurred, there is information available in some of the dmaudit menus to help you narrow down exactly when an error occurred. Examination of daemon logs and journal files may fill in the remaining blanks. Some tips are given in later sections of this manual to help you determine why certain discrepancies occurred.
During normal system operation, the daemon database and the file systems stay synchronized with each other. Each migrated user file in your file systems has a unique bfid. Each bfid has one or more active daemon database entries. If an entry in the daemon database is not in use, it is soft-deleted. (A database entry is soft-deleted when the MSP copy of the data is no longer current. Data remains on the alternate media until the database entry is deleted.)
However, things can get out of synchronization. Examples of some of the discrepancies that might occur are as follows:
Migrated files that have bfids for which no database entries exist
Active database entries for which no migrated files exist
Multiple user files that have the same bfid
System crashes are a major source of discrepancies because I/O operations in progress at the time of the crash are not guaranteed to complete successfully. For example, a migrating file might receive a new bfid, but the rewrite of its inode to disk might not succeed. Or perhaps the inode update does complete, but the corresponding database entries are not successfully made.
Inconsistencies also arise if users are allowed to modify or remove migrated files during periods when the daemon is not running, because the kernel is then unable to tell the daemon to soft-delete the corresponding database entries. The unused, or orphan, database entries then accumulate in the database, wasting space on the alternate media.
Use of the xfsdump(1m) and xfsrestore(1m) commands can also create inconsistencies.
For example, files that have been removed or modified can be restored to their previous state. If the same migrated file is restored multiple times, there will be more than one inode containing the same bfid.
Sometimes the inconsistencies are harmless, or only result in wasted space on the alternate media. In other cases, the discrepancies can prevent a user from accessing one or more files, or can result in the loss of files. dmaudit allows the administrator to quickly detect and correct such inconsistencies when they occur, possibly before any data loss becomes permanent.
After dmaudit has been initially configured, most sites use dmaudit in batch mode on a periodic basis, perhaps once a week. The easiest way to do this is through the use of a cron script. Sites may also want to use dmaudit after a known failure such as a major system crash. Instructions on how to generate a report both interactively and through a cron script are provided in Chapter 6, “Detecting Discrepancies”.
When errors are detected, you should interactively examine and correct them.
Running dmaudit on a periodic basis will help you determine why discrepancies have appeared. For example, if your system crashes and a subsequent dmaudit run shows discrepancies that were not previously present, you can be reasonably sure that they occurred because of events at the time of the crash.