Chapter 7. DMF Maintenance and Recovery

This chapter contains information for the administrative maintenance of DMF.

Retaining Old DMF Daemon Log Files

The daemon generates the SPOOL_DIR/daemon_name/dmdlog.yyyymmdd log file, which contains a record of DMF activity and can be useful for problem solving for several months after creation. All MSPs generate a SPOOL_DIR/msp_name/msplog.yyyymmdd log file, which also contains sometimes useful information about its activity. These log files should be retained for a period of some months. Log files more than a year old are probably not very useful.

Do not use DMF to manage the SPOOL_DIR file system.

The dmfsmon(8) automated space management daemon generates a log file in SPOOL_DIR/daemon_name/autolog./yyyymmdd, which is useful for analyzing problems related to space management.

To manage the log files, configure the run_remove_logs.sh task, which automatically deletes old log files according to a policy you set. See “Configuring Daemon Maintenance Tasks” in Chapter 2, for more information.

Retaining Old DMF Daemon Journal Files

Both the daemon and tape MSP generate journal files that are needed to recover databases in the event of file system damage or loss. You also configure DMF to generate backup copies of those databases on a periodic basis. You need only retain those journal files that contain records created since the oldest database backup that you keep. In theory, you should need only one database backup copy, but most sites probably feel safer with more than one generation of database backups.

For example, if you configure DMF to generate daily database backups and retain the three most recent backup copies, then at the end of 18 July there would be backups from the 18th, 17th, and 16th. Only the journal files for those dates need be kept for recovery purposes.

To manage the journal files and the backups, configure the run_remove_journals.sh and run_copy_databases.sh tasks. These tasks automatically delete old journal files and generate backups of the databases according to a policy you set. See “Configuring Daemon Maintenance Tasks” in Chapter 2, for more information.

Soft- and Hard-deletes

When a file is first migrated, a bit-file identifier, or bfid, (the key into the daemon database) is placed in the inode. When a migrated file is removed, its bfid is no longer needed in the daemon database.

Initially, it would seem that you could delete daemon database entries when their files are modified or removed. However, if you actually delete the daemon database entries and then the associated file system is damaged, the files will be irretrievable after you restore the file system.

For example, assume that migrated files were located in the /x file system, and you configured DMF to generate a full backup of /x on Sunday as part of your site's weekly administrative procedures (the run_full_dump.sh task). Next, suppose that you removed the migrated files in /x on Monday morning and removed the corresponding daemon database entries. If a disk hardware failure occurs on Monday afternoon, you must restore the /x file system to as recent a state as possible. If you restore the file system to its state as of Sunday, the migrated files are also returned to their state as of Sunday. As migrated files, they contain the old bfid from Sunday in their inodes, and, because you removed their bfids from the daemon database, you cannot recall these files.

Because of the nature of the file system, a daemon database entry is not removed when a migrated file is modified or removed. Instead, a deleted date and time field is set in the database. This field indicates when you were finished with the database entry, except for recovery purposes; it does not prohibit the daemon from using the database entry to recall a file. When the /x file system is restored in the preceding example, the migrated files have bfids in their inodes that point to valid database entries. If the files are later modified or removed again, the delete field is updated with this later date and time.

The term soft-deleted refers to a database entry that has the delete date and time set. The term hard-deleted refers to a file that is removed completely from the daemon database and the MSPs. You should hard-delete the older soft-deleted entries periodically; otherwise, the daemon database continues to grow in size without limit as old, unnecessary entries accumulate. Configure the run_hard_deletes.sh task to perform hard-deletes automatically. See “Configuring Daemon Maintenance Tasks” in Chapter 2, for more information.

If you look at all of the tapes before and after a hard-delete operation, you will see that the amount of space used on some (or all) of the tapes has been reduced.

Using xfsdump and xfsrestore with Migrated Files

File system backup is a vital operational procedure and DMF-managed file systems should be backed up regularly. Running DMF affords a high degree of protection for user data. Because DMF only migrates user data and not inodes, directories, or other file system structures, you must backup file systems that hold important data.

The xfsdump(1m) and xfsrestore(1m) commands back up file systems. These utilities are designed to perform the backup function quickly and with minimal system overhead. They operate with DMF in two ways:

  • When xfsdump encounters an offline file, it does not cause the associated data to be recalled. This distinguishes the utility from tar(1) and cpio(1), both of which cause the file to be recalled when they reference an offline file.

  • Because DMF provides safe, reliable management of offline data, it can be viewed as a data backup service. The dmmigrate(8) command lets you implement a 100% migration policy that does not interfere with customary management of space thresholds. The -a option of the xfsdump command causes xfsdump to skip the data associated with any dual-state file. Whenever xfsdump detects a file that is backed up by DMF, it retains only the inode for that file, since DMF already has a copy of the data itself.

    When you run xfsdump -a in concert with dmmigrate, the volume of backup data produced by xfsdump can be significantly reduced, thereby reducing the amount of time spent performing backups.

Most installations periodically do a full (level 0) dump of file systems. Incremental dumps (levels 1 through 9) are done between full dumps; these may happen once per day or several times per day. You can continue this practice after DMF is enabled. When a file is migrated (or recalled), the inode change time is updated. The inode change time ensures that the file gets dumped at the time of the next incremental dump.

You can configure tasks in the dump_tasks object to automatically do full and incremental dumps of the DMF-managed file systems. See “Configuring Daemon Maintenance Tasks” in Chapter 2, for more information.

The dump_tasks object employs scripts that call the xfsdump(1m) command in conjunction with the dmtape DMF support program. This mechanism gives you flexible and efficient use of a predetermined set of backup volumes that are automatically allocated to the xfsdump program as needed during the backup. In order to allow you an equally flexible and efficient method for restoring files backed up by the dump_tasks object, the dmxfsrestore(8) command should be used any time a restore is required for a dump_tasks-managed file system. Please see the dmxfsrestore(8) man page for more information on running the command.

Dumping and Restoring Files without the dump_tasks Object

If you choose to dump and restore DMF file systems without using the provided dump_tasks object, there are several items that you must remember:

  • The dump_tasks object uses xfsdump with the -a option to dump only data not backed up by DMF. You may also wish to consider using the -a option on xfsdump when dumping DMF file systems manually.

  • Do not use the -A option on either xfsdump or xfsrestore. The -A option avoids dumping or restoring extended attribute information. DMF information is stored within files as extended attributes, so if you do use -A, migrated files restored from those dump tapes will not be recallable by DMF.

  • When restoring migrated files using xfsrestore, you must specify the -D option in order to guarantee that restored files will be recallable by DMF.

  • If you use the Tape Management Facility (TMF) to mount tapes for use by xfsdump, be aware that xfsdump will not detect the fact that the device is a tape, and will behave as if the dump is instead being written to a regular disk file. This means that xfsdump will not be able to append new dumps to the end of an existing tape. It also means that if xfsdump encounters end-of-tape, it will abort the backup rather than prompting for additional volumes. You must ensure that you specify enough volumes using the tmmnt -v option before beginning the dump in order to guarantee that xfsdump will not encounter end-of-tape.

File System Consistency with xfsrestore

When you restore files, you might be restoring some inodes containing bfids that were soft-deleted since the time the dump was taken. (For information about soft-deletes, see “Soft- and Hard-deletes”.) dmaudit(8) will report this as an inconsistency between the file system and the database, indicating that the database entry should not be soft-deleted.

Another form of inconsistency occurs if you happen to duplicate offline or dual-state files by restoring all or part of an existing directory into another directory. In this case, dmaudit will report as an inconsistency that two files share the same bfid. If one of the files is subsequently deleted causing the database entry to be soft-deleted, the dmaudit-reported inconsistency will change to the type described in the previous paragraph.

While these dmaudit-reported inconsistencies may seem serious, there is no risk of any user data loss. The dmhdelete(8) program responsible for removing unused database entries always first scans all DMF-managed file systems to make sure that there are no remaining files which reference the database entries it is about to remove. It is able to detect either of these inconsistencies and will not remove the database entries in that case.

Sites should be aware that inconsistencies between a file system and the DMF database can occur as a result of restoring migrated files, and that it is good practice to run dmaudit after a restore to correct those inconsistencies.

Using dmfill

The dmfill(8) command allows you to fill a restored file system to a specified capacity by recalling offline files. When you execute xfsdump -a, only inodes are dumped for all files that have been migrated (including dual-state files). Therefore, when the file system is restored, only the inodes are restored, not the data. You can use dmfill in conjunction with xfsrestore to restore a corrupted file system to a previously valid state. dmfill recalls migrated files in the reverse order of migration until the requested fill percentage is reached or until there are no more migrated files left to recall on this file system.

Database Recovery

The basic strategy for recovering a lost or damaged DMF database is to recreate it by applying journal records to a backup copy of the database. For this reason it is essential that the database backup copies and journal files reside on a different physical device from the production databases; it is also highly desirable that these devices have different controllers and channels. The following sections discuss the database recovery strategy in more detail.

Database Backups

You configure tasks in the run_copy_databases.sh task in the dump_tasks object to automatically generate DMF database backups. See “Configuring Daemon Maintenance Tasks” in Chapter 2, for more information.

There are several databases in the DMF package. The daemon database consists of the following files:

  • HOME_DIR/daemon_name/dbrec.dat

  • HOME_DIR/daemon_name/dbrec.keys

  • HOME_DIR/daemon_name/pathseg.dat

  • HOME_DIR/daemon_name/pathseg.keys

The database definition file (in the same directory) that describes these files and their record structure is named dmd_db.dbd.

Each tape MSP has two databases in the HOME_DIR/msp_name directory:

  • The CAT database (files tpcrdm.dat, tpcrdm.key1.keys, and tpcrdm.key2.keys)

  • The VOL database (files tpvrdm.dat and tpvrdm.vsn.keys)

The database definition file (in the same directory) that describes these files and their record structure is named atmsp_db.dbd.

Database Recovery Procedures

The DMF daemon and the tape MSP write journal file records for every database transaction. These files contain binary records that cannot be edited by normal methods and that must be applied to an existing database with the dmdbrecover(8) command. The following procedure explains how to recover the daemon database.


Warning: If you are running multiple MSPs, always ensure that you have the correct journals restored in the correct directories. Recovering a database with incorrect journals can cause irrecoverable problems.


Procedure 7-1. Recovering the Databases

If you lose a database through disk spindle failure or through some form of external corruption, use the following procedure to recover it:

  1. Stop DMF.

  2. If you have configured the run_copy_databases task, copy the files from the directory with the most recent copy of the databases that were in HOME_DIR.

  3. If you have not configured the run_copy_databases task, reload an old version of the daemon or tape MSP database. Typically, these will be from the most recent dump tapes of your file system.

  4. Ensure that the default JOURNAL_DIR/daemon_name (or JOURNAL_DIR/msp_name) directory contains all of the time-ordered journal files since the last update of the older database.

    For the daemon, the files are named dmd_db.yyyymmdd[.hhmmss].

    For the tape MSP, the journal files are named atmsp_db.yyyymmdd[.hhmmss].

  5. Note the time of the last database update from step 2.

  6. Use dmdbrecover to update the old database with the journal entries from journal files identified in step 3.

Example 7-1. Database Recovery Example

Suppose that the file system containing HOME_DIR was destroyed on February 1, 1997, and that your most recent backup copy of the daemon and tape MSP databases is from January 28, 1997. To recover the database, you would do the following:

  1. Stop DMF.

  2. Ensure that JOURNAL_DIR/daemon_name (or JOURNAL_DIR/msp_name) contains the following journal files (one or more for each day):

    JOURNAL_DIR/daemon_name

    dmd_db.19970128.235959
    dmd_db.19970129.235959
    dmd_db.19970130.235959
    dmd_db.19970131.235959
    dmd_db.19970201

    JOURNAL_DIR/msp_name

    atmsp_db.19970128.235959
    atmsp_db.19970129.235959
    atmsp_db.19970130.235959
    atmsp_db.19970131.235959
    atmsp_db.1997020

  3. Restore databases from January 28, to HOME_DIR/daemon_name and/or HOME_DIR/msp_name. The following files should be present:

    HOME_DIR/daemon_name

    dbrec.dat
    dbrec.keys
    pathseg.dat
    pathseg.keys

    HOME_DIR/MSP_Name

    tpcrdm.dat
    tpcrdm.key1.keys
    tpcrdm.key2.keys
    tpvrdm.dat
    tpcrdm.vsn.keys

  4. Update the database files created in step 3 by using the following commands:

    dmdbrecover -n daemon_name dmd_db
    dmdbrecover -n msp_name atmsp_db