Chapter 7. Examining and Correcting Discrepancies

This chapter provides an overview of how to use dmaudit to examine and correct errors and a walk-through for each menu, using as an example some of the inconsistencies introduced when a file system that uses DMF is restored. It also includes additional information on each of the error classes.

Overview of the Correction Process

Correcting errors with dmaudit uses a two-step process. The first step is to interactively select which errors you want to correct; the second step is to correct those errors in either interactive or batch mode.

Errors are divided into eight major classes. Each of these classes has its own correction menu, although the contents of these menus are largely the same. The error classes are described in Table 7-1:

Table 7-1. dmaudit error classes

Class

Description

1

More than one file has the same bfid, and dmaudit cannot determine by itself which database entries go with which migrated file. User data may or may not be lost.

2

More than one file has the same bfid, but dmaudit has enough information to determine which database entries go with which migrated file. User data may or may not be lost. This class implies database errors.

3

Migrated files for which data cannot be found.

4

Files with minor errors that can be repaired without loss of user data. These are usually caused by events such as the restoration of a file system. No user data is at risk in a recoverable error, so they are usually corrected with detailed examination.

5

Active daemon database entries for which no migrated files can be found. No user data is affected by this type of error. The only negative effect of errors in this class is the space wasted by the unused file copies on the migration media. This type of error usually occurs when there is user activity while the daemon is not running.

6

Bfid sets with errors which indicate that file states or daemon database entries were actively changing while the snapshot was in progress, or that have changed since the snapshot completed. Their status cannot be evaluated and therefore they cannot be fixed without taking another snapshot. You can, however, look at the most recent state of the bfid sets.

7

Bfid sets with files that are internally inconsistent.

8

Bfids found in file systems not configured for DMF and for which no database entries were found. User data has been lost.


Selecting Which Errors to Correct

To select errors to be corrected, enter the number of the error class you want to correct, which displays the appropriate error-class menu. Each error class must be corrected separately. Within each error class menu, choose which errors you want to correct by accepting the modifications dmaudit needs to correct them. Each error class except class 6 has an accept option for this purpose. If you change your mind, you can cancel your acceptance by using the cancel option.

Correcting the Errors

When you are satisfied with the list of error corrections you have accepted, the next step is to return to the Main menu and apply your changes using the apply option. There are two ways to do this:

  • If the corrective actions presented by dmaudit imply that very little tape and disk activity is required, you may want to fix the errors immediately. In that case, select apply on the Main menu and wait. When the Main menu next appears, all errors are corrected.

  • If the corrective actions imply that a lot of tape and disk activity is required, you may want to wait until off-peak hours to actually correct the errors. You can do it interactively, or you can do it in batch mode by specifying apply as a parameter on the dmaudit command line.

The inspect Menu

To introduce the most common correction options available within dmaudit, the next few sections show an example in which the xfsdump(1m) and xfsrestore(1m)commands were used to cause errors to appear. Restoring a file system can cause errors because the file system is being returned to a previous state, but the daemon database remains in its current state.

An attempt was made in these examples to produce as many different kinds of errors as possible. If you have to restore a file system, your next dmaudit report will probably only contain a subset of the errors shown here.

Errors were produced first by using a dump command to dump a DMF file system. User activity was then simulated by migrating, recalling, and removing files in the file system. The file system was then restored using a restore command, and dmaudit was run to produce an error report.

For purposes of the example, it is assumed that you have already taken a snapshot and that errors were detected. The error summary portion of the dmaudit report would show something like the following:

  DAEMON DATABASE ERROR REPORT
  ----------------------------
  There are 8 user files that have correctable errors.

  There are 2 bfids in the daemon database for which no user files can be found.

Assume you decide to correct the errors. You must always start from the Main menu. If you are not currently running dmaudit, issue the dmaudit command.

If you are running dmaudit but are in some other menu, enter up as many times as necessary until you arrive at the Main menu.

Your screen will look like the following:

  MAIN MENU
  ---------

  Select:
     <inspect>    Inspect and correct file system and database errors
     <report>     Reprint status report for the current snapshot
     <verifymsp>  Check the dmatmsp tape msp databases against the daemon
                  databases
     <snapshot>   Take a snapshot and report status of file systems and databases
     <free>       Release all file space used by the current snapshot
     <config>     Examine or modify configuration information
     <quit>       Quit

  Please enter your selection:

To examine and correct errors, enter inspect, which causes the following menu to appear:

  INSPECT FILE SYSTEM AND DATABASE ERRORS
  ---------------------------------------

  Errors are divided into several major classes. Only those classes that pertain
  to this snapshot are displayed.

  Select:
     <4>        Examine files that have correctable errors
     <5>        Examine bfids in the daemon database for which no user files
                can be found
     <bfid>     Examine all files and database entries for any bfid you specify
     <search>   Scan file systems for names of all files with errors (very slow)
     <up>       Return to the previous menu

  Please enter your selection:

There is a direct correspondence between the two error classes shown in this menu and those mentioned in the error report. You must examine and correct each of these classes separately. To examine a particular class, enter its class number at the prompt. See “Fixing Bfid Sets for Which No User Files Exist”.

The following sections describe the bfid and search options; the up option returns you to the Main menu. The error-class options (4 and 5) are described in “Fixing Bfid Sets for Which No User Files Exist”, and “Fixing Files with Correctable Errors”.

The bfid Option

The bfid option is used to examine everything that is known about a particular bfid set. You can enter a particular bfid set value or select the default value displayed at the prompt:

Select:
   <bfid>     Examine all files and database entries for any bitfile ID you
              specify
   <search>   Scan file systems for names of all files with errors (very slow)
   <up>       Return to the previous menu
Please enter your selection: bfid
(34745a2b0000000000000444): <enter an alternate bfid here or return>

BITFILE ID = 34745a2b0000000000000444


ENTRIES IN USE:

01. dual-state user file - dev 50331653 (daemon), fhandle
    010000000000001888c5f6086f39b65d000e0000000000000000000000401f8c, uid 285,
    size 175538
02. daemon MSP <ftp> database entry - not soft deleted, size 175538, key
    <abc/34745a2b0000000000000444>


  Select:
     <next>   Examine the next bfid in the snapshot
     <prev>   Examine the previous bfid in the snapshot
     <mode>   Switch to full display
     <dump>   Append all information about this bfid to a text file in a format
              suitable for machine processing
     <up>     Return to the previous menu

  Please enter your selection:

This is called an abbreviated display because it shows only some of the information known about this bfid set. It shows only those pieces of information most commonly examined and has the advantage of taking up less screen space. There are two objects associated with bfid 34745a2b0000000000000444 : a user file and an MSP daemon database entry. The user file has a state of dual-state, a device number of 50331653, a user identification (UID) of 285, and a size in bytes of 175538. The database entry is for an MSP named ftp, and the key the MSP uses to retrieve the file is abc/34745a2b0000000000000444 . Furthermore, the soft-delete field is 0, indicating that the entry is still active.

You can see a more detailed description of the bfid set by selecting the mode option, which redisplays the bfid set in full-display mode. This display shows all information known by dmaudit about the bfid set. If you enter mode for this bfid set, you see the following (the mode option is actually a toggle; each time you select it, the display switches to the display type not currently shown):

BITFILE ID = 34745a2b0000000000000444

ENTRIES IN USE:

01. dual-state user file - dev=50331653 (daemon)
    fhandle=010000000000001888c5f6086f39b65d000e0000000000000000000000401f8c
    size=175538 uid=285 nlinks=1
02. daemon MSP database entry - dev=50331653 ino=4202380 size=175538 uid=285
    otime=Nov 20 13:08:36 1997 utime=Nov 20 13:08:36 1997 ctime=Nov 20 13:08:36
    1997 dtime=0 name=dmf_tst.00061 msp=ftp key=<abc/34745a2b0000000000000444>


Select:
   <next>   Examine the next bfid in the snapshot
   <mode>   Switch to abbreviated display
   <dump>   Append all information about this bfid to a text file in a format
            suitable for machine processing
   <up>     Return to the previous menu

Please enter your selection: 

Several new pieces of information appear. The number of hard links for the user file are now shown, as is every field in the daemon's MSP database entry. In both cases, information that does not fit on the current line is continued on subsequent lines without line numbers.

To conserve screen space, the names of the fields in a database entry in this display are somewhat shorter than the names actually used by the daemon in its dbrec structure (see “DMF Daemon Database Contents” in Chapter 2, for a description of the structure). Table 7-2 shows the correspondence between names for the fields in the full display and in the daemon dbrec structure.

Table 7-2. Database field descriptions

Display name

dbrec structure

Description

dev

origdv

Specifies the device number of the file

ino

origino

Specifies the inode number of the file

size

origsz

Specifies the file size in bytes

uid

userid

Specifies the user ID of the file owner

otime

otime

Specifies the origination time of the entry

utime

utime

Specifies the last update time of the entry

ctime

ctime

Specifies the last check time of the entry

dtime

delflag

Specifies the soft-delete time of the entry

name

ofilenm

Specifies the base name (if known) of the file

msp

proc

Specifies the MSP name

key

path

Specifies the MSP key or path name

Several other options are available on the bfid display. Because dmaudit maintains its data in bfid-set order, the next and prev options allow you to examine bfid sets in the database adjacent to the current one.

Entering next shows you the next higher bfid set in use. If you enter prev, you would see consecutively lower bfid sets. This can sometimes be useful if you want to see the state of bfid sets whose files were migrated at about the same time as the current file.

The dump option is not commonly used. It allows you to dump all information contained in the full display into a file in machine-readable form. If you select the dump option, it prompts you for the name of a file:

File to which you want the dump appended? 
Enter the path name of a file to which you want the information to go. Either full or relative path names can be used. If the file does not already exist, dmaudit creates it. If it does exist, it appends the information to the end of the file. If you instead press ENTER , dmaudit assumes that you have changed your mind and returns to the bfid menu.

The format of the dump information is complex; for a description, see Appendix A, “dump Option Output”.

When the dump completes, dmaudit redisplays the bfid menu.

From this display, enter up to return to the Inspect menu.

The search Option

The search option only appears on the Inspect menu when there are bfid sets with errors that have user files associated with them. For example, if there were only class-5 errors in the following menu, search would not appear because no user files would be involved:

  INSPECT FILE SYSTEM AND DATABASE ERRORS
  ---------------------------------------
  Errors are divided into several major classes. Only those classes that pertain
  to this snapshot are displayed.

  Select:
     <4>        Examine files that have correctable errors
     <5>        Examine bfids in the daemon database for which no user files
                can be found
     <bfid>     Examine all files and database entries for any bfid you specify
     <search>   Scan file systems for names of all files with errors (very slow)
     <up>       Return to the previous menu

  Please enter your selection:

The purpose of search is to find the path names for each user file whose bfid set has errors associated with it. dmaudit does this by recursively scanning through the directories of each file system that contains bfid sets with errors.


Note: This process can take a long time, perhaps hours on very large file systems. You can interrupt the process by pressing Control-c , but then you must start the search again from the beginning. It is better to examine the errors first to see if you really need to know the names of the files. If you do later decide you need the names, you can return here and enter search.

A search from the Inspect menu finds the names of all user files whose bfid sets have errors for all applicable classes. The individual error class menus also have a search option, but their search is restricted to those user files that are in their error class. If you need names for multiple classes, it is usually more efficient to get all the names at once from the Inspect menu. If you enter search, dmaudit pauses for an indefinite period of time as the file systems are scanned. When the Inspect menu next reappears, all the names have been collected, and the search option disappears from the menu.

Fixing Bfid Sets for Which No User Files Exist

To continue with the example, you are now ready to examine each of the error classes to see what kinds of errors occurred. Assume that you entered 5 to look at those bfid sets whose database entries are not soft-deleted and for which no migrated user files can be found.

The following menu appears:

DAEMON DATABASE BITFILE IDS FOR
WHICH NO FILES CAN BE FOUND
---------------------------

There are 64 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 64 MSP daemon database entries that should be soft deleted.


The following actions will be taken to correct these errors.

02.  64 MSP daemon database entry soft-delete flags will be set.


Select:
   <accept>    Accept the recommended actions.  Once accepted, these actions
               will be taken when you select 'apply' on the main menu.
   <examine>   Examine or modify the actions to be taken for individual files.
               You may enter the line number of any one of the above subsets of
               files; the default is to examine all the files.
   <bfid>      Examine all files and database entries for any bitfile ID you
               specify.
   <dump>      Append all information available on each of the bitfile IDs
               listed above to a text file in a format suitable for machine
               processing.
   <up>        Return to the previous menu.

Please enter your selection:

For each of the error class menus, dmaudit always shows the following:

  • Number of bfid sets with errors in this error class

  • Summaries of all the individual errors found in the bfid sets in this class

  • What actions dmaudit needs to take to fix the individual errors

The messages displayed in these menus depend upon what errors were found on your system. There are five options available on this display. up returns you to the Inspect menu, and the bfid option functions as described “The bfid Option”.

The accept Option

The most important option on an Error Class menu is accept. By entering accept, you tell dmaudit that you are willing to let it take the listed actions to correct the discrepancies in these bfid sets. In this particular case, it means that dmaudit will soft-delete 64 daemon database entries. Action is not taken until you enter apply at the Main menu.

After you have entered accept, the format of the menu changes slightly. The accept option disappears because it is no longer needed, and in its place you will instead see the following:

<cancel>    Cancel your acceptance of the above recommendations.

The cancel option allows you to change your mind. If you enter cancel, it disappears and accept reappears, returning the screen to its original state. You can change your mind as many times as you like up until the point at which you enter apply on the Main menu. At that time, dmaudit fixes all bfid sets whose accept flag is currently set.

Most of the time, you do not need to examine these bfid sets in more detail and can type accept followed by up to move on to other error classes. If you do want to go into additional detail, there are additional options available to help you.

The examine Option

The examine option allows you to look at individual bfid sets within this error class. The format of its display is very similar to the bfid display. If you enter examine, you are prompted for a line number:

Line number (<CR> for all)?

Each of the errors and each of the actions reported on the Error Class menu have line numbers on them. If you enter a line number at the examine prompt, it examines only information about bfid sets that have the specified error or action associated with them. If you want to examine all bfid sets in this error class, press ENTER. Because there are only two bfid sets in this class, assume that you press ENTER.

You then see the following:

BITFILE ID = 34745a2b000000000000c6c1

ENTRIES IN USE:

01. no user file -
02. daemon MSP database entry - dev=50331653 ino=3445388 size=692637 uid=11414
    otime=Nov 26 06:04:20 1997 utime=Nov 26 06:04:20 1997 ctime=Nov 26 06:04:20
    1997 dtime=0 name=/NONAME msp=ftp key=<dinesh/34745a2b000000000000c6c1>
       This database entry should be soft deleted.
       The soft delete flag will be set in this entry.


Select:
   <next>     Examine the next bfid in the list
   <last>     Examine the last bfid in the list
   <mode>     Switch to abbreviated display
   <edit>     Edit this bfid
   <dump>     Append all information about this bfid to a text file in a format
              suitable for machine processing
   <accept>   Accept the changes to this bitfile ID
   <up>       Return to the previous menu
Enter <CR> to continue:

Please enter your selection:

This display shows an example of a bfid set with an error. Each object in a bfid set that has an error associated with it also has error and action descriptions following it on the screen. In this case, line 2 is followed by the error description and action required to correct the database entry given. It explains that no user file was found with this bfid.

Table 7-3 describes the other options on this menu:

Table 7-3. examine menu options

Option

Description

next

Displays the next bfid set within the list of bfid sets you chose to look at. After that bfid is displayed, the prev option appears, allowing you to see previous bfid sets.

last

Goes directly to the last bfid set in the list instead of having to enter next multiple times. The first option, when it appears, allows you to return to the first bfid set in the list.

mode

Toggles back and forth between abbreviated and full versions of the display.

edit

Allows you to make changes to the bfid set to force dmaudit to correct it in some other way. This option is only needed in several very specific cases; it should normally not be used. For more information, see “The edit Option”.

dump

Dumps all information about the bfid set to a file in machine-readable form. You are prompted for the name of a file to which data should be appended.

accept

Accepts the actions required to correct the bfid set without affecting other bfid sets in the same error class. This option should seldom be needed; normally you will accept all errors in the class at the same time.


The dump Option

The dump option is similar to the one on the bfid and examine displays, but this dump outputs all information known about any of the bfid sets in this error class rather than just a single bfid set.

If you enter dump, you are prompted for a line number:

Line number (<CR> for all)?

Enter a line number if you only want to examine bfid sets that have a specified error or action associated with them. If you want to examine all bfid sets in this error class, press ENTER. The next prompt asks for a file name:

File to which you want the dump appended?

Enter the full or relative path name of a file, or press ENTER to cancel.

Returning to the Inspect Menu

Enter accept to accept the changes to these two bfid sets, and then enter up again to move on to the next error class.

You then see the following:

  INSPECT FILE SYSTEM AND DATABASE ERRORS
  ---------------------------------------
  Errors are divided into several major classes. Only those classes that pertain
  to this snapshot are displayed.

  Select:
     <4>        Examine files that have correctable errors
     <5>        Examine bfids in the daemon database for which no user files
                can be found
     <bfid>     Examine all files and database entries for any bfid you specify
     <search>   Scan file systems for names of all files with errors (very slow)
     <up>       Return to the previous menu

  Please enter your selection:

Fixing Files with Correctable Errors

The next step in the example is to look at correctable errors. The correctable error class introduces several new options that were not available in the previous Error Class menu.

If you enter 4, you see the following:

FILES WHICH HAVE CORRECTABLE ERRORS
-----------------------------------
There are 9 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 9 MSP daemon database entries with an unknown MSP name.
02.  There are 9 migrated user files that have no MSP daemon database entries.


The following actions will be taken to correct these errors.

03.  9 MSP daemon database entries will be removed.
04.  9 migrated user files will have incomplete MSP daemon database entries
     created for their default MSPs.
05.  9 migrated user files will be remigrated to all MSPs for which they have
     incomplete database entries.


Select:
   <accept>     Accept the recommended actions.  Once accepted, these actions
                will be taken when you select 'apply' on the main menu.
   <examine>    Examine or modify the actions to be taken for individual files.
                You may enter the line number of any one of the above subsets
                of files; the default is to examine all the files.
   <bfid>       Examine all files and database entries for any bitfile ID you
                specify.
   <opt_part>   Recompute all corrective actions assuming that daemon database
                entries with errors can be removed as long as one good copy of
                the file exists elsewhere.
   <search>     Scan file systems for the names of all files listed above (very
                slow).
   <dump>       Append all information available on each of the bitfile IDs
                listed above to a text file in a format suitable for machine
                processing.
   <nlist>      Append all file names for all the files listed above to a text
                file after first scanning the file systems for their names
                (very slow).
   <up>         Return to the previous menu.

Please enter your selection: 

While this display is certainly more complex than the display for the previous error class, its format is essentially the same. The first section on the screen shows those errors that were detected in the bfid sets in this class, and the second section shows what actions dmaudit will take to correct those errors.

Recoverable errors tend to be more complex than errors you have seen previously. The following is an example of what a bfid set with recoverable errors looks like:

BITFILE ID = 34745a2b000000000000954b


ENTRIES IN USE:

01. dual-state user file - dev 50331652 (daemon), fhandle
    010000000000001888c5f6086f39b65c000e00000000000200000000007040f6, uid
    15948, size 3863854
       No MSP daemon database entries exist for this user file.
       Incomplete MSP daemon database entries will be created for the default
       MSPs associated with this user file.
       This file will be remigrated to those MSPs for which it has incomplete
       database entries.

The online file is the only available copy of the user file's data. Even though the file is in a dual state, no daemon database entries exist for the file, and therefore no backup copies are available. In order to repair this bfid set, dmaudit has to remigrate the user file to all the MSPs that have the incomplete entry. Several new options that are available on the Recoverable Error Class menu are described in the following sections.

The opt_full and opt_part Options

The following bfid set has no MSP database entry, but the dual-state user file indicates that the online copy of the file is valid. The display is as follows:

BITFILE ID = 34745a2b000000000000954b

ENTRIES IN USE:

01. dual-state user file - dev 50331652 (daemon), fhandle
    010000000000001888c5f6086f39b65c000e00000000000200000000007040f6, uid
    15948, size 3863854
       No MSP daemon database entries exist for this user file.
       Incomplete MSP daemon database entries will be created for the default
       MSPs associated with this user file.
       This file will be remigrated to those MSPs for which it has incomplete
       database entries.

There are two ways to fix this problem:

  • Recreate the MSP database entries and remigrate the file to all MSPs specified in the current configuration.

  • Remove the bitfile ID, making the file a regular online file.

As released, dmaudit uses the first approach. This can be modified in two ways. If you enter config from the Main menu, one of the displayed options allows you to specify which method dmaudit uses as its default. dmaudit remembers the new default from run to run. For more information, see “The invalid Option” in Chapter 8. You can also see the results of using the second approach by entering the opt_part option on the current display. When selected, dmaudit redisplays the Error Class menu, replacing the following action:

04.  9 migrated user files will have incomplete MSP daemon database entries
     created for their default MSPs.
05.  9 migrated user files will be remigrated to all MSPs for which they have
     incomplete database entries.

The other actions on the display remain the same (except that some lines have different line numbers). If you then enter examine to look at the same bfid set previously shown, you see the same change in actions.

After you have entered opt_part, it disappears from the menu and is replaced by the opt_full option, which toggles back to the first recovery approach. You can toggle back and forth as much as you like. When you enter accept, dmaudit uses the recovery method currently in force. If you toggle the method after entering accept, dmaudit performs a cancel on those bfid sets affected by the toggle. You then must accept the changes again.

The search Option

The search option allows you to find the names of all user files associated with bfid sets in this error class. The names of the files are found by recursively scanning the directories of the file systems in which the files reside. This can take a long time, so select this option only if you truly need to know the file names. If all the names of all the user files in this error class are already known, the search option does not appear.

After the names have been found, they appear in several places, such as on the examine display. The following screen shows the beginning of the display for a bfid used in the previous example:

BITFILE ID = 34745a2b000000000000954b

ENTRIES IN USE:

01. dual-state user file - dev=50331652 (daemon)
    fhandle=010000000000001888c5f6086f39b65c000e00000000000200000000007040f6
    size=3863854 uid=15948 nlinks=0
    /dmi1/stim/dir1/dmf_tst.04266
       No MSP daemon database entries exist for this user file.
       The bitfile ID will be removed from this user file.

The bfid (bitfile ID) will be removed from this user file. The path names are also output when using the dump option, and when using nlist (described in the next section). If you need to know the names of user files in more than one error class, it is more efficient to use the search option that appears on the Inspect menu instead of using the search option repeatedly in each of the Error Class menus.

The nlist Option

The nlist option is useful when you want to create a list of the names of user files in a particular error class. nlist saves the list in a file whose name you specify. Normally, the nlist option looks like the following:

  <nlist>      Append all file names for all the files listed above to a text
               file after first scanning the file systems for their names
               (very slow).

If you have previously used the search option to retrieve the names of the files in this error class, the nlist option will instead look like the following:

  <nlist>      Append all file names for all the files listed above to a text
               file.

You are prompted for a line number:

Line number (<CR> for all)?

Enter a line number if you only want to examine bfid sets that have a specified error or action associated with them. If you want to examine all bfid sets in this error class, press ENTER. The next prompt asks for a file name:

Enter the full or relative path name of a file, or press ENTER to cancel.

You can either enter one of the line numbers appearing on the Error Class menu to create a list containing a subset of the files in this error class, or you can press ENTER to get the names of all user files in this error class. You are then prompted for the name of a file to which the file names should be appended:

File to which you want the file names appended?

Enter the full or relative path name of a file, or press ENTER to cancel the operation. The following display shows a sample file created using the nlist option:

  /dmf1/example/cmp_to_inc
  /dmf1/example/back\e134slash
  /dmf1/example/pipe\e174char
  /dmf1/example/link1
  /dmf1/example/link2
  /dmf1/example/link3
  /dmf1/example/inc_to_cmp
  /dmf1/example/lost_inc_pfile
  /dmf1/example/rmv_inc_pfile

The example shows several things that you must be aware of when you use the output from nlist, especially when using it as input to some other program:

  • If a file has multiple path names, nlist lists each of the path names on consecutive lines. This is done for hard links only; symbolic links to a file are not shown. In the previous example, the path names for link1, link2, and link3 are all hard links to the same file.

  • Because nothing prevents users from creating file names containing line feeds, vertical tabs, and other unprintable characters, nlist converts such characters into printable form before adding the path name to the list. For each backslash, vertical bar, or unprintable character in a file name, nlist replaces that character with the string, where the 000 field is a 3-digit octal number that is the ASCII value for that character. In the previous example, file back\slash contains a backslash and file pipe|char contains a vertical bar.

Returning to the Inspect Menu

Assuming that you are done looking at this error class, you would enter accept to accept the changes to the bfid sets, and then enter up again to return to the Inspect menu. You then see the following:

  INSPECT FILE SYSTEM AND DATABASE ERRORS
  ---------------------------------------

  Errors are divided into several major classes. Only those classes that pertain
  to this snapshot are displayed.

  Select:
     <4>        Examine files that have correctable errors
     <5>        Examine bfids in the daemon database for which no user files
                can be found
     <bfid>     Examine all files and database entries for any bfid you specify
     <search>   Scan file systems for names of all files with errors (very slow)
     <up>       Return to the previous menu

  Please enter your selection:

Because you have now accepted the corrections in both error classes, enter up to return to the Main menu where you apply the changes that you have accepted. For more information, see “The apply Option”.

An xfsdump(1m) and xfsrestore(1m) can only produce class 4 and class 5 errors. The next few sections show examples of other error classes you might see.

Cleaning up Files with Unrecoverable Errors

A bfid set contains an unrecoverable error if a user inode exists but no valid copy of the user's data can be found. You should never see a bfid set with unrecoverable errors in normal operation. If one does occur, it means that either an administrative error was made, a file system error occurred, or that DMF has failed in some way. You should always examine errors in this class to determine why they happened, in order to prevent possible recurrences in the future.

To give an example of an unrecoverable error, a test file is migrated offline to alternate media. The dmdadm(8) command is then used to manually remove the database entry for the associated bfid to induce the error. When dmaudit is next run, it reports an unrecoverable error. The resulting Inspect menu looks like the following:

  INSPECT FILE SYSTEM AND DATABASE ERRORS
  ---------------------------------------

  Errors are divided into several major classes. Only those classes that pertain
  to this snapshot are displayed.

  Select:
     <3>        Examine files whose data cannot be recovered
     <bfid>     Examine all files and database entries for any bfid you specify
     <search>   Scan file systems for names of all files with errors (very slow)
     <up>       Return to the previous menu

  Please enter your selection:

Entering 3 advances you to the following unrecoverable error class menu:

FILES WHOSE DATA CANNOT BE RECOVERED
------------------------------------

There are 1 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 1 migrated user files for which no good data copies can be
     found.


The following actions will be taken to correct these errors.

(No actions will be taken.)


Select:
   <accept>    Accept the recommended actions.  Once accepted, these actions
               will be taken when you select 'apply' on the main menu.
   <examine>   Examine or modify the actions to be taken for individual files.
               You may enter the line number of any one of the above subsets of
               files; the default is to examine all the files.
   <bfid>      Examine all files and database entries for any bitfile ID you
               specify.
   <search>    Scan file systems for the names of all files listed above (very
               slow).
   <dump>      Append all information available on each of the bitfile IDs
               listed above to a text file in a format suitable for machine
               processing.
   <nlist>     Append all file names for all the files listed above to a text
               file after first scanning the file systems for their names (very
               slow).
   <remove>    After first scanning the file systems for their names, remove
               files that cannot be recovered (very slow).
   <up>        Return to the previous menu.

Please enter your selection:

Removing the lost user file is not one of the actions listed in lines 4 and 5. To remove lost files, you must use the remove option as described in “The remove Option”.

The following is an excerpt from the abbreviated display for the bfid set with an unrecoverable error:

BITFILE ID = 20000000000000001


ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000420000000000000043, uid 0,
    size 4474272
       No good data copy can be found for this user file.


Select:
   <mode>     Switch to full display
   <edit>     Edit this bfid
   <dump>     Append all information about this bfid to a text file in a format
              suitable for machine processing
   <accept>   Accept the changes to this bitfile ID
   <up>       Return to the previous menu

Please enter your selection: 

The remove Option

The dmaudit command does not automatically remove unrecoverable user files when you enter accept and apply. To actually delete the files, you must select the remove option. If you have not used the search option, the remove option looks like the following:

  <remove>    After first scanning the file systems for their names, remove
              files that cannot be recovered (very slow).

If you have already used search to find the file's path names, the remove will instead look like the following:

  <remove>    Remove files that cannot be recovered.

After you enter the remove option, the error class display shows the additional action that the files will be removed:

FILES WHOSE DATA CANNOT BE RECOVERED
------------------------------------
There are 1 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 1 migrated user files for which no good data copies can be
     found.


The following actions will be taken to correct these errors.

02.  1 migrated user files will be removed.

You should then enter accept to accept the removal of the files. The following is an excerpt from the abbreviated display of the bfid set that contained the unrecoverable error:

BITFILE ID = 20000000000000001

ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000420000000000000043, uid 0,
    size 4474272
    /mig/foo
       No good data copy can be found for this user file.
       This file will be removed.

Cleaning up Files with the Same Bfid but Different Sizes

There are two error classes in which more than one user file can have the same bfid. This section describes those cases in which dmaudit is able to use the sizes of the various files to determine which database entries go with which user files.

When the daemon is started, it determines the next available bfid to be allocated by looking in the database to find the highest file number in use for the current database ID. The daemon increments that file number by one and uses it the next time a bfid must be assigned to a user file.

Errors in this class usually happen when an event occurs, just before the daemon stops, that causes the last few database updates to be lost. An example is a system interrupt in which the last updates to the database before the crash never arrive on disk.

The database entries for those bfids were not successfully placed in the database. Therefore, when the daemon is restarted, it will not know that those bfids are in use and will reallocate them to new user files.

If such an event did occur, you would see the following line as part of the dmaudit error report:

There are 1 bfids in use by more than one file that can be automatically corrected.

Entering inspect from the Main menu would show the following:

INSPECT FILE SYSTEM AND DATABASE ERRORS
---------------------------------------
Errors are divided into several major classes.  Only those classes that pertain
to this snapshot are displayed.

Select:
   <2>        Examine files with nonunique bitfile IDs that can be
              automatically corrected
   <5>        Examine bitfile IDs in the daemon database for which no user
              files can be found
   <bfid>     Examine all files and database entries for any bitfile ID you
              specify
   <search>   Scan file systems for names of all files with errors (very slow)
   <up>       Return to the previous menu

Please enter your selection: 

Enter 2 to see the following:

FILES WITH NONUNIQUE BITFILE IDS THAT
CAN BE AUTOMATICALLY CORRECTED
---------------------------------
There are 1 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 1 migrated user files for which no good data copies can be
     found.


The following actions will be taken to correct these errors.

02.  1 migrated user files and their database entries will all be given new
     bitfile IDs.


Select:
   <accept>    Accept the recommended actions.  Once accepted, these actions
               will be taken when you select 'apply' on the main menu.
   <examine>   Examine or modify the actions to be taken for individual files.
               You may enter the line number of any one of the above subsets of
               files; the default is to examine all the files.
   <bfid>      Examine all files and database entries for any bitfile ID you
               specify.
   <search>    Scan file systems for the names of all files listed above (very
               slow).
   <dump>      Append all information available on each of the bitfile IDs
               listed above to a text file in a format suitable for machine
               processing.
   <nlist>     Append all file names for all the files listed above to a text
               file after first scanning the file systems for their names (very
               slow).
   <remove>    After first scanning the file systems for their names, remove
               files that cannot be recovered (very slow).
   <up>        Return to the previous menu.

Please enter your selection: 

All of the menu items on this display have already been explained in the previous sections. What is new in this class is the complexity of some of the errors and the actions needed to resolve them.

The following is an example of an abbreviated display for a bfid set in this error class:

BITFILE ID = 345612ca0000000000000001

ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001a0000000000000048, uid 0,
    size 29604
       No good data copy can be found for this user file.

02. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
       This file and its database entries will all be given a new bitfile ID.
03. daemon MSP <ftp1> database entry - not soft deleted, size 4474272, key
    <root/345612ca0000000000000001>


Select:
   <mode>     Switch to full display
   <edit>     Edit this bfid
   <dump>     Append all information about this bfid to a text file in a format
              suitable for machine processing
   <accept>   Accept the changes to this bitfile ID
   <up>       Return to the previous menu

Please enter your selection: 

There is one new, subtle piece of information on this display: the blank line between items 1 and 2, which indicates that more than one file has this bfid, and that the database entry in item 3 goes with the user file in line 2 (because the files have identical sizes), not the user file in item 1. Each group of lines separated from the rest by a blank line will be recovered separately by dmaudit.

Because two files have the same bfid, dmaudit gives a new bfid to at least one of the files so that there is no overlap of entries in the daemon database. In the previous example, dmaudit has decided to leave item 1 as it is. It instead assigns a new bfid to the user file in item 2. It then creates incomplete database entries for the default MSPs for UID 0 and migrates the file to those MSPs.

The opt_part option offers much simpler correction of these errors by not requiring dmaudit to leave a file with the same number of MSP copies as it had before recovery. In this particular case, selecting opt_part tells dmaudit that it does not need to create missing MSP entries.

Cleaning up Multiple Files with the Same Bfid and Size

dmaudit cannot itself solve errors in which it has found more than one user file has the same bfid and the same size in bytes. There are two main ways this type of error can occur:

  • The most common way is when a user file is restored using a xfsrestore(1m) command. For example, assume that a user owns a migrated file that has multiple path names (hard links). Assume further that the user uses the rm(1) command to remove one of the path names, then asks the administrator to restore the file. Your restore command creates a migrated inode using the path name supplied by the user even though the original migrated inode is still in the file system, pointed to by the remaining hard links. The result is that now two inodes exist in the file system with the same bfid and with the same size.

  • It is also possible that the assignment of the same bfid to more than one inode is an error such as those described in the previous section, and that it is only a coincidence that the two files have the same size.

In the first case, the two inodes are really meant to contain the same data. In the second case they are not. Furthermore, in the second case, dmaudit has no way of knowing which database entry should go with which file.

To resolve these errors, dmaudit must rely upon information supplied by you. There are options that you can use to tell dmaudit which of the above two cases occurred. You also have the ability to tell dmaudit which database entry goes with which file when necessary.

If such an error occurs, the dmaudit error report will contain something like the following:

  DAEMON DATABASE ERROR REPORT
  ----------------------------
  There are 1 bfids in use by more than one file that cannot be corrected without
  additional information from you.

If you enter the inspect option from the Main menu, you see the following:

INSPECT FILE SYSTEM AND DATABASE ERRORS
---------------------------------------
Errors are divided into several major classes.  Only those classes that pertain
to this snapshot are displayed.

Select:
   <1>        Examine files with nonunique bitfile IDs that cannot be corrected
              without additional information from you
   <bfid>     Examine all files and database entries for any bitfile ID you
              specify
   <search>   Scan file systems for names of all files with errors (very slow)
   <up>       Return to the previous menu

Please enter your selection: 

Entering 1 places you at the Error Class menu:

FILES WITH NONUNIQUE BITFILE IDS THAT CANNOT BE
CORRECTED WITHOUT ADDITIONAL INFORMATION FROM YOU
-------------------------------------------------

There are 1 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 2 migrated user files that cannot be cleaned up without
     additional information from you.


The following actions will be taken to correct these errors.

(No actions will be taken.)


Select:
   <examine>   Examine or modify the actions to be taken for individual files.
               You may enter the line number of any one of the above subsets of
               files; the default is to examine all the files.
   <bfid>      Examine all files and database entries for any bitfile ID you
               specify.
   <search>    Scan file systems for the names of all files listed above (very
               slow).
   <dump>      Append all information available on each of the bitfile IDs
               listed above to a text file in a format suitable for machine
               processing.
   <nlist>     Append all file names for all the files listed above to a text
               file after first scanning the file systems for their names (very
               slow).
   <up>        Return to the previous menu.

Please enter your selection: 

Because the names of the files are often a good clue as to whether the duplicates were created by your restore command, you may want to use the search option on this menu to collect the names of the files before continuing. When the names have been determined, select the examine option, and press ENTER when you are prompted for a line number. You will want to examine each bfid set in this error class, one at a time.

The following is an example of such a bfid set:

BITFILE ID = 20000000000000001

ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /mig/bar
       You must specify whether this file is a duplicate created by the
       'restore' command.

02. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /mig/foo
       You must specify whether this file is a duplicate created by the
       'restore' command.
03. daemon MSP <ftp1> database entry - soft deleted, size 4474272, key
    <root/345612ca0000000000000001>


Select:
   <mode>   Switch to full display
   <edit>   Edit this bfid
   <dump>   Append all information about this bfid to a text file in a format
            suitable for machine processing
   <up>     Return to the previous menu

Please enter your selection: 

The first step is to decide whether one of the files was created by a restore command. Often the easiest way to determine this is to use the files' UIDs to determine the owner of the files, and then ask the owner if the files were meant to contain the same data.

The edit Option

When you have determined whether the files should contain the same data, you must enter the edit option to pass that information on to dmaudit. The following display appears:

EDIT ONE BITFILE ID
-------------------
BITFILE ID = 20000000000000001


ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /mig/bar
       You must specify whether this file is a duplicate created by the
       'restore' command.

02. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /mig/foo
       You must specify whether this file is a duplicate created by the
       'restore' command.
03. daemon MSP <ftp1> database entry - soft deleted, size 4474272, key
    <root/345612ca0000000000000001>


Select:
   <transfer>    Transfer a database entry to the ownership of a different user
                 file
   <remove>      Remove objects that you do not want to keep
   <nondup>      Indicate that a file is NOT a duplicate created by the
                 'restore' command
   <duplicate>   Indicate that a file is a duplicate created by the 'restore'
                 command
   <mode>        Switch to full display
   <up>          Return to the previous menu

Please enter your selection: 

This is the menu from which you give directions to dmaudit on how to resolve the error. When you start making changes to the bfid set, you cannot leave this menu until you either accept the changes you have made so far or until you cancel them. (accept and cancel will appear when you make your first change.)

When you have given enough information for dmaudit to resolve an error, it moves the bfid set to its appropriate new error class. For example, assume that using the edit display you told dmaudit that the file on line 1 is a restore duplicate of the file on line 2. Using that new information, dmaudit could reclassify the bfid set as having only a recoverable error and would want to move the bfid set to the recoverable error class. When you start to make changes to a bfid set, the up option disappears from the menu, keeping you on the edit display until you enter either accept or cancel. Entering accept from this menu means that you accept any reclassification that will take place as a result of your changes. As soon as you enter accept, the up option appears again, allowing you to leave the display. You can still enter cancel at this point if you change your mind. When you select up and leave the display, dmaudit reclassifies the bfid set and moves it to its appropriate new error class.

If you make some changes to the bfid set, but those changes are insufficient to tell dmaudit how to resolve all ambiguities, dmaudit will leave the bfid set in the current error class.

When dmaudit can solve all the bfid sets in this error class, the error class becomes empty and all the bfid sets are moved to other error classes.

When the bfid sets have been moved, you must go to those Error Class menus and accept the changes before using apply on the Main menu. It is therefore a good idea to resolve bfid sets in this error class first before proceeding to other error classes.

The duplicate Option

Use the duplicate option if you want to indicate to dmaudit that a file is a duplicate created by anxfsrestore command.

You are prompted for the line number of the file that is the duplicate:

Line number of duplicate file created by 'restore'?  

Enter the line number at the prompt. In the example, line 1 is the restored file. You are prompted for the line number of the original file:

Line number of file it is a duplicate of?

In the example, the original file is on line 2. dmaudit then redisplays the bfid including errors and actions based upon the new information that you provided.

The example would look like the following:

EDIT ONE BITFILE ID
-------------------
BITFILE ID = 20000000000000001


ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /mig/foo
02. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /mig/bar
       You have specified that this user file is a duplicate created by the
       'restore' command.
       This file will be recalled, and will then be remigrated to the same MSPs
       using a new bitfile ID.
03. daemon MSP <ftp1> database entry - soft deleted, size 4474272, key
    <root/345612ca0000000000000001>
       This database entry should not be soft deleted.
       The soft delete flag will be cleared from this entry.


Select:
   <accept>   Accept the changes you have made
   <remove>   Remove objects that you do not want to keep
   <nondup>   Indicate that a file is NOT a duplicate created by the 'restore'
              command
   <cancel>   Cancel all changes you have ever made to this bitfile ID
   <mode>     Switch to full display

Please enter your selection: 

dmaudit now has all the information it needs to resolve the errors in this bfid. If you enter accept and then enter up to get back to the Inspect menu, you see that the bfid set has moved to the recoverable error class.

The Inspect menu would look like the following:

INSPECT FILE SYSTEM AND DATABASE ERRORS
---------------------------------------
Errors are divided into several major classes.  Only those classes that pertain
to this snapshot are displayed.

Select:
   <4>      Examine files that have correctable errors
   <bfid>   Examine all files and database entries for any bitfile ID you
            specify
   <up>     Return to the previous menu

Please enter your selection: 

To finish the cleanup of the bfid set, enter 4, enter accept, then return to the Main menu (by using the up option), and enter apply. dmaudit then recalls the duplicate file, removes its bfid, and remigrates the file to the same MSPs used by the original file. Each file then has its own separate MSP copies.

The transfer, remove, and nondup Options

The three other options available on the edit display let you make other changes to a bfid set that influences how it is corrected. Although they are not needed very frequently, they are extremely powerful, allowing you to completely control how dmaudit resolves its errors.

The remove option allows you to specify objects in the bfid set that you want dmaudit to discard. The transfer option allows you specify to which user file a particular database entry belongs. The nondup option allows you to state that a particular user file is not a duplicate created by a restore command.

Consider the following example:

EDIT ONE BITFILE ID
-------------------
BITFILE ID = 20000000000000001


ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /dmf1/example/restore_dir/myfile
       You must specify whether this file is a duplicate created by the
       'restore' command.

02. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /dmf1/example/orig_dir/myfile
       You must specify whether this file is a duplicate created by the
       'restore' command.
03. daemon MSP <ftp1> database entry - not soft deleted, size 4474272, key
    <root/345612ca0000000000000001>


Select:
   <transfer>    Transfer a database entry to the ownership of a different user
                 file
   <remove>      Remove objects that you do not want to keep
   <nondup>      Indicate that a file is NOT a duplicate created by the
                 'restore' command
   <duplicate>   Indicate that a file is a duplicate created by the 'restore'
                 command
   <mode>        Switch to full display
   <up>          Return to the previous menu

Please enter your selection: 

Assume that the file on line 1 is not a duplicate of the file on line 2, but is a different file. Assume further that the MSP database entry is a copy of the file on line 1.

The first step is to indicate that the file in line 1 is not a duplicate. To do this, select the nondup option. You are prompted for the line number of the original user file:

Line number of user file?

After you enter 1, the edit display looks like the following:

EDIT ONE BITFILE ID
-------------------
BITFILE ID = 20000000000000001


ENTRIES IN USE:
01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /dmf1/example/restore_dir/myfile
       No good data copy can be found for this user file.
       You have labeled this migrated user file as being unique.

02. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /dmf1/example/orig_dir/myfile
       This file and its database entries will all be given a new bitfile ID.
03. daemon MSP <ftp1> database entry - not soft deleted, size 4474272, key
    <root/345612ca0000000000000001>


Select:
   <accept>      Accept the changes you have made
   <transfer>    Transfer a database entry to the ownership of a different user
                 file
   <remove>      Remove objects that you do not want to keep
   <duplicate>   Indicate that a file is a duplicate created by the 'restore'
                 command
   <cancel>      Cancel all changes you have ever made to this bitfile ID
   <mode>        Switch to full display

Please enter your selection: 

dmaudit now knows that the file on line 1 is not a duplicate. It realizes that two different files have the same bfid, and so that one of its actions is to assign a new bfid to one of the files and its database entries so that there is no longer any overlap.

dmaudit still believes that the MSP database entry belongs to the file in line 2, so you must now enter transfer to tell dmaudit that the database entry really belongs to the file on line 1. You are asked the line number of the object to reassign:

Line number of object to be transferred?

In this case, the database entry you want to transfer is on line 3, so you enter 3. Next you are asked to which user file the database entry is to be assigned:

User file to transfer it to?

Enter 1. The edit display then looks like the following:

EDIT ONE BITFILE ID
-------------------
BITFILE ID = 20000000000000001


ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /dmf1/example/restore_dir/myfile
       You have labeled this migrated user file as being unique.
       This file and its database entries will all be given a new bitfile ID.
02. daemon MSP <ftp1> database entry - not soft deleted, size 4474272, key
    <root/345612ca0000000000000001>

03. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /dmf1/example/orig_dir/myfile
       No good data copy can be found for this user file.


Select:
   <accept>      Accept the changes you have made
   <transfer>    Transfer a database entry to the ownership of a different user
                 file
   <remove>      Remove objects that you do not want to keep
   <duplicate>   Indicate that a file is a duplicate created by the 'restore'
                 command
   <cancel>      Cancel all changes you have ever made to this bitfile ID
   <mode>        Switch to full display

Please enter your selection: 


Note: A side effect of moving objects around is that many objects may end up with different line numbers after the transfer. Be aware of this when making subsequent requests.

The final step is to remove the unwanted user file that has no data. However, if in another case you wanted to remove an entry, enter remove, which prompts you for the line number of the object to remove:

Line number of object to be removed?

Enter the line number for the user file with no data. The edit display would then look like the following:

EDIT ONE BITFILE ID
-------------------
BITFILE ID = 20000000000000001


ENTRIES IN USE:

01. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e00000000001b0000000000000048, uid 0,
    size 4474272
    /mig/bar
       You have labeled this migrated user file as being unique.
02. daemon MSP <ftp1> database entry - not soft deleted, size 4474272, key
    <root/345612ca0000000000000001>


DISCARDED ENTRIES:

03. off-line user file - dev 33554464 (daemon), fhandle
    010000000000001885beb4c83ff373e0000e0000000000430000000000000043, uid 0,
    size 4474272
    /mig/foo
       You have chosen to remove this migrated user file.
       This file will be removed.

Select:
   <accept>      Accept the changes you have made
   <transfer>    Transfer a database entry to the ownership of a different user
                 file
   <remove>      Remove objects that you do not want to keep
   <nondup>      Indicate that a file is NOT a duplicate created by the
                 'restore' command
   <duplicate>   Indicate that a file is a duplicate created by the 'restore'
                 command
   <cancel>      Cancel all changes you have ever made to this bitfile ID
   <mode>        Switch to full display

Please enter your selection: 

Bfid Sets That Cannot Be Immediately Corrected

Occasionally, dmaudit encounters a bfid set containing errors that is being migrated or unmigrated by someone at the time of the snapshot. In this case, dmaudit is unable to correct the errors because it knows its image of the bfid set will be inaccurate once the migrate or unmigrate request completes.

Sometimes a bfid set with errors is not active at the time of the snapshot, but when dmaudit searches for the path names of files in the bfid set, it finds that their inodes have changed since the snapshot was taken. In this case, dmaudit knows that its information is out-of-date.

In such cases, dmaudit still reports the errors that were detected, but the bfid set is placed in a special class in which it can be examined but cannot be corrected. The daemon error report shows something like the following:

There are 1 bfids whose errors cannot be corrected until a new snapshot is taken.

The Inspect menu shows the following:

  INSPECT FILE SYSTEM AND DATABASE ERRORS
  ---------------------------------------
  Errors are divided into several major classes. Only those classes that pertain
  to this snapshot are displayed.

  Select:
     <6>        Examine bfids that have changed since the snapshot was taken
     <bfid>     Examine all files and database entries for any bfid you specify
     <search>   Scan file systems for names of all files with errors (very slow)
     <up>       Return to the previous menu

  Please enter your selection:

If you enter the Error Class menu number, you see the following:

BITFILE IDS THAT HAVE CHANGED
SINCE THE SNAPSHOT WAS TAKEN
----------------------------
There are 1 bitfile IDs in this error class.


The following errors were detected in these bitfile IDs.

01.  There are 1 migrated user files for which no good data copies can be
     found.
02.  There are 1 user files in an unexpected DMF state.


The following actions will be taken to correct these errors.

03.  1 migrated user files and their database entries will all be given new
     bitfile IDs.


Select:
   <examine>   Examine or modify the actions to be taken for individual files.
               You may enter the line number of any one of the above subsets of
               files; the default is to examine all the files.
   <bfid>      Examine all files and database entries for any bitfile ID you
               specify.
   <search>    Scan file systems for the names of all files listed above (very
               slow).
   <dump>      Append all information available on each of the bitfile IDs
               listed above to a text file in a format suitable for machine
               processing.
   <nlist>     Append all file names for all the files listed above to a text
               file after first scanning the file systems for their names (very
               slow).
   <up>        Return to the previous menu.

Please enter your selection: 

All the normal options are available with the notable exception of accept. accept does not appear because dmaudit knows its information is out-of-date.

To correct such errors, you must wait until the daemon activity for these bfid sets has completed and then take a new snapshot. Usually, you can use the dmdidle(8) command to force outstanding migrate requests to be completed.

The apply Option

When you have used the accept option on each of the Error Class menus to select the errors you want to correct, enter up repeatedly until you arrive at the Main menu and the apply option:

  MAIN MENU
  ---------
  Select:
     <apply>      Apply all the changes you have accepted
     <inspect>    Inspect and correct file system and database errors
     <report>     Reprint status report for the current snapshot
     <verifymsp>  Check the dmatmsp tape msp databases against the daemon
                  databases
     <snapshot>   Take a snapshot and report status of file systems and databases
     <free>       Release all file space used by the current snapshot
     <config>     Examine or modify configuration information
     <quit>       Quit

  Please enter your selection:

When you enter apply, dmaudit corrects all errors that you have selected with the accept options. You can do this one of two ways:

  • If the actions listed in the error class displays indicate that relatively little daemon, tape, and disk activity is required to correct the errors, you can enter apply at the prompt and wait. If dmaudit has difficulty correcting any of the errors, it issues messages to your screen. When the Main menu next appears, the errors that can be corrected will have been resolved.

  • If you anticipate that significant activity will occur while recovering the errors, you may want to wait until off-peak hours. At that time you can select apply interactively, or you can run dmaudit in batch mode. For example, you could log off while dmaudit continued to correct errors if you enter the following (using the Bourne shell):

    nohup /etc/dmf/dmbase/etc/dmaudit apply 2>errs & 
    

When dmaudit finishes executing, any problems that occurred are listed in the errs file.

dmaudit usually has no difficulty resolving errors. The only time problems occur is if the bfid set has changed since the snapshot was taken. This could happen if the owner of a file has recently tried to recall or remove it. dmaudit always first verifies that a bfid set has not changed since the snapshot was taken before beginning to correct its errors. If the bfid set has changed, dmaudit reports that fact and continues with the next bfid set. All errors that cannot be corrected are left as is; you must take a fresh snapshot to resolve the bfid sets that changed.

The dmaudit error report is updated dynamically as errors are corrected. For example, if you enter report from the Main menu after all errors have been corrected, you see the following:

No errors were discovered comparing the file systems against the daemon database.

If you see that files will be recalled, ensure that there is sufficient room in the file systems.