Chapter 1. Introduction

This chapter provides an overview of SGI® DMF tiered-storage virtualization software. It discusses the following:

DMF Features

DMF software transparently moves file data from high-performance but expensive disk to levels of decreased-performance but inexpensive media known as secondary storage. This lets you cost-effectively maintain a seemingly infinite amount of data without sacrificing accessibility for users.

This section discusses the following features of DMF software:

Automatic Monitoring of Filesystem Space

A managed filesystem is an XFS or CXFS filesystem mounted with the Data Management Application Programming Interface (DMAPI) enabled and for which DMF software can migrate and/or recall migrated data. DMF software continuously monitors managed filesystems on high-performance disk so that it can maintain a certain amount of free space in those filesystems. This free space permits the creation of new files and the recall of previously migrated files. Figure 1-1 describes the concept of the DMF migration cycle between the managed filesystem and the secondary storage.

Figure 1-1. DMF Cycle

DMF Cycle

DMF software automatically detects a drop below the free-space threshold. DMF software then transparently moves file data from the managed filesystem to the secondary storage by freeing the data blocks of files that have already been migrated. File migration occurs in two stages:

  • Stage One: A file's data is copied (migrated ) to secondary storage.

  • Stage Two: After the copy is secure, the file is eligible to have its data blocks released. This occurs only after a minimum free-space threshold is reached or when a manual request to free a file's disk blocks is made via the dmput -r command. DMF software choses file data to free according to site-defined policies involving size and access time.

For example, Figure 1-2 shows a configuration where DMF software will free the data blocks of less-recently accessed files (such as represented by the letter “A”) to empty the managed filesystem well below the threshold as new files are added or as previously migrated files (such as represented by the letters “B” and “E”) are recalled. Despite the movement of data, all content is accessible all of the time.


Note: When configured according to best practices, DMF software makes two copies of migrated data for safety reasons. Data will be recalled from a second copy only if necessary. For simplicity, Figure 1-2 does not show the second copy of file data.


Figure 1-2. Free-Space Minimum Threshold

Free-Space Minimum Threshold

Easy and Constant Availability of Data

In general, only the most timely data resides on the higher-performance disk; DMF software automatically migrates less timely data to secondary storage. However, all of the data always appears to be online to users and applications using normal access methods, regardless of the data's actual location.

Although DMF software moves file data, it leaves file metadata in place so that users can access files without knowing the actual location of the data. Metadata consists of items such as index nodes (inodes) and directory structure. Migrated files appear as normal files to users and are always easily accessible via high-performance network connections.

Because migrated files remain cataloged in their original directories, users and applications never need to know where the data actually resides; they can access any migrated file using normal processes. In fact, when drilling into directories or listing their contents using standard POSIX-compliant commands, a user cannot determine the location of file data within the storage tier; determining the data's actual residence requires special commands or command options.

A file whose data blocks have been freed is considered from the DMF software perspective to be offline and its data blocks are therefore available for new active data, either new files or recalled files. However, from the user perspective , the file always appears to be online because the inodes and directories remain in the managed filesystem, allowing users to access the file by normal means.

The only difference users might notice when accessing a file whose data blocks have been freed is a delay in response time, because the data must be retrieved from secondary storage. From the user's perspective, all data always appears to be available online, regardless of its actual location.

Partial-State Files

Managed files can have multiple distinct file regions with different residency states. A region is a contiguous range of bytes that have the same residency state. A file that has more than one region is called a partial-state file. A file that is in a static state (that is, not currently being migrated or unmigrated) can have one region that is online in the managed filesystem for immediate access and another region that is offline and must be recalled in order to be accessed.

Partial-state files provide the following capabilities:

  • Accelerated access to first byte, which allows you to access the beginning of an offline file before the entire file has been recalled.

  • Partial-state file online retention, which allows you to keep a specific region of a file online while freeing the rest of it (for example, if you wanted to keep just the beginning of a file online). See “ranges Clause” in Chapter 7.

  • Partial-state file recall, which allows you to recall a specific region of a file without recalling the entire file. For more information, see the dmput(1) and dmget(1) man pages.

For additional details, see:

Safety and Scalability

DMF software transports large volumes of data on behalf of many users and has evolved to satisfy customer requirements for scalability and the safety of data:

  • When you configure the DMF environment using best practices, DMF software creates at least two secondary-storage copies of the data in order to prevent file data loss in the event that a migrated copy is lost. See “Ensuring Data Integrity”.

  • Because system interrupts and occasional storage device failures cannot be avoided, it is essential that the integrity of data be verifiable. Therefore, DMF software also provides tools necessary to validate your storage environment. See “Commands Overview”.

  • DMF with the Parallel Data-Mover Option (referred to as Parallel DMF) lets you scale the DMF I/O capacity in cost-effective increments. A data mover is a node running processes that migrate and recall data to secondary storage. In the basic DMF product, the DMF server incorporates the functionality of an integrated data-mover node. Parallel DMF allows the DMF system to reside on a single server and minimizes the cost of a DMF implementation. For users with higher throughput requirements, this option allows multiple data movers to operate in parallel, increasing data throughput and enhancing resiliency. The parallel data-mover node's dedicated function is to move data to and from secondary storage. See “Parallel DMF Overview”.

Site-Defined Migration Policies

As a DMF administrator, you determine how disk space capacity is handled by doing the following:

  • Selecting the filesystems that DMF software will manage

  • Specifying the amount of free space that will be maintained on each filesystem

  • Ranking file-selection criteria, such as file size and file age

DMF software selects files for migration and frees data blocks of already migrated files based on site-defined criteria that are specified in a migration policy. For example, a migration policy does the following:

  • Makes the specified number of copies of migrated data. DMF software places those copies on separate secondary-storage targets. SGI strongly recommends that you create at least two secondary-storage copies in order to prevent file data loss in the event that one copy is damaged.

  • Migrates the data at the times specified or when the specified free-space minimum threshold is exceeded.

  • Optionally keeps a small amount of data in the managed filesystem for each file, even after migration (for use by file managers, in order to avoid unnecessary recall of a file due to directory browsing).

  • Maintains a specified percentage of the managed filesystem free for new data (either new files or recalled files). When the filesystem reaches this threshold, DMF software will free the already-migrated data blocks until the specified percentage of the filesystem is free, normally selecting files by size and last-access time.

A Variety of Migration Targets

DMF software can migrate data to the following media:

  • Cloud storage

  • Fibre Channel tapes and tape libraries that are supported by the OpenVault or TMF mounting services

  • SCSI low-voltage differential (LVD) tapes and tape libraries


    Note: If you have a high-voltage differential (HVD) tape or tape library that you want to use for the DMF environment, you must contact SGI Professional Services for assistance in obtaining the appropriate HVD-LVD converter.

    The LVD requirement is only for tapes and tape libraries. It does not apply to HVD disk.


  • Disk

  • COPAN RAID sets:

    • COPAN massive array of idle disks (MAID)

    • SGI 400 virtual tape library (VTL)

  • JBFS configurations

  • Another server (via NFS or FTP)

You can also use disk or COPAN RAID sets as a cache in conjunction with another migration target to provide multiple levels of migration; see “Multiple Storage Tiers”.

Support for Fileserving Applications

DMF software supports a range of storage-management applications. In some environments, DMF software is used strictly to manage highly stressed online disk resources. In other environments, it is also used as an organizational tool for safely managing large amounts of data. In all environments, DMF software scales to the storage application and to the characteristics of the available storage devices.

DMF software interoperates with the following:

  • Standard data export services such as NFS and FTP

  • XFS® filesystems

  • CXFS™ clustered filesystems

  • Microsoft® SMB (also known as the CIFS) as used by Samba when fileserving to Windows® systems

By combining these services with DMF software, you can configure an SGI system as a high-performance fileserver.

DMF Manager Web Interface

DMF software provides a set of graphical and command-line tools to help you configure, monitor, and manage the DMF system. DMF Manager is a web-based tool you can use to do the following:

  • Configure the DMF environment

  • Install DMF licenses

  • Display status of the DMF environment

  • Display reports about internal DMF processing queues, allowing you to cancel and reprioritize specific requests

  • Start and stop DMF processes

  • Deal with day-to-day DMF operational issues

  • Display performance metrics, including filesystem throughput and volume usage

  • Create custom statistics reports

  • Accommodate tape volumes that are physically not in the tape library

  • Show SGI Linear Tape File System (LTFS) information, configure LTFS, and mount/unmount LTFS tapes

  • Restore filesystems and filesystem components

For details, see:

Also see:

DMF Manager is useful for all DMF customers from enterprise to high-performance computing and is available via the Firefox® and Internet Explorer® web browsers.

At a glance, you can see if the DMF environment is operating properly. An icon in the upper-right corner indicates if the DMF environment is up (green) or down (upside down and red). If something requires, DMF Manager makes actions available to identify and resolve problems. The tool volunteers information and provides context-sensitive online help. DMF Manager also displays performance statistics, allowing you to monitor DMF activity, filesystems, and hardware.

Figure 1-3 is an example of the Overview panel. It shows status of the DMF environment, including the following:

  • The DMF environment is up (green icon)

  • There are some warnings that may require action (yellow icon)

  • The /dmi_fs2 filesystem is related to the volume1 and volume2 volume groups (VGs)

Figure 1-3. DMF Manager

DMF Manager

DMF Control from Client Platforms

This section discusses the following:

Overview of the Windows Client

The DMF client for Windows systems lets users and administrators control DMF via file shares configured on the Samba server. The Samba server must have SGI enhanced Samba installed, and may be either the DMF server or a CXFS client-only node.

Using Windows Explorer, you can do the following for files on which you have the appropriate permission, depending upon site-specific configuration:

  • View DMF file properties

  • Execute the following DMF user operations:

    • Recall files, similar to the functionality of the dmget(1) command

    • Migrate files, similar to the functionality of the dmput(1) command

    • Set site tags, similar to the functionality of the dmtag(1) command

  • Set a project ID, similar to functionality of the dmprojid(8) command

See DMF 6 Client Guide for Windows Systems.

Overview of the IRIX, Mac OS X, Linux, and Solaris Clients

Several DMF user commands are available natively on DMF clients running any of the following operating systems (see the DMF release notes for the specific versions that are supported):

  • SGI IRIX®

  • Apple® Mac OS X®

  • Red Hat® Enterprise Linux® (RHEL)

  • SUSE® Linux® Enterprise Server (SLES)

  • Sun™ Solaris™

For more details, see “User Commands”.

High Availability

You can run DMF software in a high-availability (HA) cluster.


Caution: This will require some configuration requirements and administrative procedures (such as starting/stopping DMF services) that differ from the information in this DMF guide. For more information about DMF and HA, see High Availability Guide for SGI InfiniteStorage.


SOAP Web Service

DMF software provides access to a subset of the DMF client functions via the DMF Simple Object Access Protocol (SOAP) web service. For more information, see Chapter 16, “DMF SOAP Server”.

Direct Archiving

You can use the direct archiving feature to manually copy file data between a POSIX filesystem (such as a Lustre™ filesystem) directly to DMF secondary storage by configuring the POSIX filesystem for archive use in the DMF configuration file and using the dmarchive(1) command. The POSIX filesystem cannot be DMAPI-enabled (that is, it cannot be mounted with the dmi mount option) and is known as an archive filesystem. When using this feature, DMF software copies the file data to DMF secondary storage while placing the metadata in a filesystem that is managed by DMF. See “Use dmarchive to Copy Archive File Data to Secondary Storage” in Chapter 3.

Mounting Services

When you purchase DMF software, you also receive the following mounting services:

  • OpenVault storage library management facility, applicable to SLES or RHEL. See OpenVault Administrator Guide for SGI InfiniteStorage.

  • Tape Management Facility (TMF), applicable to SLES only. See TMF 6 Administrator Guide for SGI InfiniteStorage.

Out-of-Library Tapes

When OpenVault is the mounting service, DMF software will try to retrieve data from an in-library volume before requesting that an out-of-library tape be imported. See “Using Out-of-Library Tapes” in Chapter 5.

How DMF Software Works

This section discusses the following:

File States

DMF software uses the following terminology with regard to the state of a file in a managed filesystem:

  • Regular file (REG) is a file residing only on the high-performance disk in the managed filesystem.

  • Migrating file (MIG) is a file whose copies on secondary storage are in progress.

  • Migrated file is a file that has one or more complete copies on secondary storage and no pending or incomplete offline copies. A migrated file is one of the following from the perspective of DMF software:

    • Dual-state file (DUL) is a file whose data resides both on the high-performance disk and on secondary storage

    • Offline file ( OFL) is a file whose data is no longer on the high-performance disk (the data is offline from the DMF perspective, but from the user perspective the data always appears to be available online)

    • Unmigrating file (UNM) is a previously offline file in the process of being recalled to the high-performance disk

    • Partial-state file (PAR) is a file with some combination of dual-state, offline, and/or unmigrating regions

When a file is first migrated, DMF software copies the data to secondary storage but may not immediately free the data in the managed filesystem on the high-performance disk. During this period, the file is considered to be dual-state because it resides in both locations. Like a regular file, a migrated file has an inode. An offline file or a partial-state file requires the intervention of the DMF daemon to access its offline data; a dual-state file is accessed directly from the original that still exists in the managed filesystem.

The operating system informs the DMF daemon when a migrated file is modified. If anything is written to a migrated file, the offline copy is no longer valid, and the file becomes a regular file until it is migrated again.

If you are using DMF direct archiving to copy files from a filesystem that is not managed, archiving files are files where the original resides on an archive filesystem (one not managed by DMF software, such as Lustre) and whose offline copies are in progress. When the process completes, the files are offline files.

DMF Methods

The migration process is managed by a daemon-like component called a library server (LS) or media-specific process (MSP):

  • LS (dmatls) transfers data to and from the following types of volumes:

    • Magnetic tape in a tape library (also known as a robotic library or silo)

    • RAID sets in a COPAN MAID system[1]

    • Virtual tapes in an SGI 400 VTL system

    • JBFS configurations

  • Cloud MSP (dmcloudmsp ) transfers data to and from a cloud storage system accessible via a network (local or Internet).

  • FTP MSP (dmftpmsp ) uses the File Transfer Protocol to transfer data to and from disks of another system on the network.

  • Disk MSP (dmdskmsp ) uses a filesystem mounted on the DMF server itself as the location on which to store/recall file data. See “Use an Appropriate Filesystem for a Disk MSP” in Chapter 3.

  • Disk cache manager (DCM) MSP is the disk MSP configured for n-tier capability by using a dedicated filesystem as a cache. DMF software can manage the disk MSP's storage filesystem and further migrate it to secondary storage, thereby using a slower and less-expensive dedicated filesystem as a cache to improve the performance when recalling files. DCM MSP configuration generally first migrates data to cache on (for example) serial ATA (SATA) disk and then at a later time migrates the data from the SATA disk to secondary-storage on physical tape. The filesystem used by the DCM MSP must be a local XFS or CXFS filesystem.

  • Fast-mount cache configuration is a special configuration of an LS volume group that simultaneously migrates data to a copy on the cache target (such as COPAN MAID or JBFS configurations) with rapid mount and positioning characteristics and to secondary-storage copies on the other targets (such as physical tape). This configuration provides similar functionality to a DCM MSP but does not downwardly migrate data from the cache tier; in this configuration, an entire volume on the cache can be freed immediately when the fullness threshold is reached. See “Fast-Mount Cache Configuration Overview”.

A site can use any combination of DMF methods.

Figure 1-4 and Figure 1-5 summarize these concepts and “Multiple Storage Tiers” provides more details and illustrations.

Figure 1-4. DMF Methods: Before Migrating

DMF Methods: Before Migrating

Figure 1-5. DMF Methods: After Migrating Data and Freeing Space

DMF Methods: After Migrating Data and Freeing
Space

Multiple Storage Tiers

The various DMF methods provide multiple storage tiers:

The figures in the following subsections show the use of multiple tiers and the concepts of DMF data migration (in which file data is copied from the managed filesystem to the secondary storage, but the inode remains in place in the managed filesystem) and data recall.


Note: For simplicity, the figures in this chapter do not address a second copy of secondary storage. Data will be recalled from a second copy only if necessary.


Two Tiers

LS and non-cache MSPs (cloud MSP, disk MSP, or FTP MSP) provide two tiers of storage media:

  • Tier-1: Managed filesystem on high-performance disk

  • Tier-2: Secondary storage on cloud storage, disk (including COPAN MAID, COPAN VTL, and JBFS configurations), FTP server, or tape

Figure 1-6 and Figure 1-7 show an example of the process using two tiers.

Figure 1-6. Two Tiers: Migrating File Data

Two Tiers: Migrating File Data

Figure 1-7. Two Tiers: Freeing and Recalling File Data

Two Tiers: Freeing and Recalling File Data

Three Tiers using DCM MSP

Adding a DCM MSP provides three tiers of storage media:

  • Tier-1: Managed filesystem on high-performance disk

  • Tier-2: Cache on high-capacity, low-cost disk that will downwardly migrate and free data on a file basis

  • Tier-3: Secondary storage on cloud storage, FTP server, or tape

Figure 1-8 and Figure 1-9 show an example of the process using three tiers of storage with a DCM MSP, where data moves first to a cache on lower-performance but less-expensive disk, then to inexpensive storage. The file will be recalled from disk cache as long as it resides there because it is faster than recalling from the third tier.

Figure 1-8. DCM MSP: Migrating File Data

DCM MSP: Migrating File Data

Figure 1-9. DCM MSP: Freeing and Recalling File Data

DCM MSP: Freeing and Recalling File Data

Three Tiers using Fast-Mount Cache

Adding a fast-mount cache provides three tiers of storage media:

  • Tier-1: Managed filesystem on high-performance disk

  • Tier-2: Fast-mount cache (such as COPAN MAID or JBFS configurations) that will be freed on a volume basis (no downward migration)

  • Tier-3: Secondary storage on cloud storage, FTP server, JBFS configurations, and tape

Figure 1-10 and Figure 1-11 show an example of the process using three tiers of storage, where a copy of the data is simultaneously placed in tier-2 fast-mount cache (such as COPAN MAID or JBFS configurations) and in tier-3 secondary storage (such as tape). The file will be recalled from the cache as long as it resides there because it is faster than recalling from tier-3 storage.


Note: Unlike the DCM MSP, this method does not migrate data from the cache to tier-3; therefore, volumes on the cache can be freed immediately when the fullness threshold is reached.


For more information, see “Fast-Mount Cache Configuration Overview”.

Figure 1-10. Fast-Mount Cache: Migrating File Data

Fast-Mount Cache: Migrating File Data

Figure 1-11. Fast-Mount Cache: Freeing and Recalling File Data

Fast-Mount Cache: Freeing and Recalling File
Data

Migration Process

You choose both the percentage of the filesystem to migrate and the amount of free space. You as the administrator can manually trigger file migration or file owners can issue manual migration requests.

A file is migrated when the automated space-management controller dmfsfree(8) selects the file or when an owner requests that the file be migrated by using the dmput(1) command.

When the daemon receives a request to migrate a file, it does the following:

  1. Adjusts the state of the file.

  2. Ensures that the necessary MSPs/VGs are active.

  3. Sends a request to the MSPs/VGs, who in turn copy data to the secondary storage media.

When the MSPs/VGs have completed the offline copies, the daemon marks the file as migrated in its database and changes the file to dual-state. If the user specifies the dmput -r option, or if dmfsfree requests that the file's space be released, the daemon releases the data blocks and changes the file state to offline. For more information, see the dmput(1) man page.


Note: DMF software does not migrate pipes, directories, or UNIX® or Linux special files.


Recall of File Data

This section discusses the following:

Recall from the Appropriate Location

Data is provided to the user from the appropriate location:

  • If a user accesses a dual-state file, the data comes directly from the high-performance disk as normal, providing the fastest access.

  • After the data blocks on the managed filesystem are freed, DMF software automatically recalls the file's data from the secondary storage when the user accesses the file, placing the data back on the managed filesystem; at this point, the file once again becomes a dual-state file. (If the user then changes the file, it returns to being a regular file.)

Order of Recall

When a migrated file must be recalled, a request is made to the DMF daemon. The daemon selects an MSP/VG from its internal list and sends that MSP/VG a request to recall a copy of the file. If more than one MSP/VG has a copy, the first one in the list is used. The list is created from the configuration file.

For illustration purposes, suppose that the DMF configuration file contains the following definitions for an environment using a single library server with two drive groups that each have two volume groups to specify the location of file copies:

  • The dmdaemon object contains the following parameter to identify the library server:

    LS_NAMES          myls
    

  • The libraryserver object defined for myls contains the following parameter to identify the drive groups and their order of selection:

    DRIVE_GROUPS     fruits veggies

  • The drivegroup objects defined for fruits and vegetables identify their respective volume groups and their order of selection:

    • For fruits:

      VOLUME_GROUPS     oranges apples

    • For veggies:

      VOLUME_GROUPS     carrots peas

The order in which volumes are chosen for recall is decided by the order in which drive groups and then volume groups are listed in their respective definitions. Given the above, oranges will be tried first and peas will be tried last. If you stopped DMF, reordered the list in DRIVE_GROUPS, and restarted DMF, then carrots would be tried first and apples be tried last, as shown in Figure 1-12.

Figure 1-12. Recall Order

Recall Order


Note: You must not change these parameters while DMF is running.

For more details, see Chapter 7, “DMF Configuration File”, specifically:

Freeing Data Blocks to Make Space for Recalls

If you recall more files than the managed filesystem can currently contain, DMF software migrates other files and will free the data blocks of already-migrated files (according to site-specific policies) until the filesystem is once again well below the free-space minimum threshold.


Note: A file's data blocks on the managed filesystem can only be freed after the data has been copied to secondary storage.


Fast-Mount Cache Configuration Overview

This section discusses the following:

Cache and Secondary-Storage Targets

You can use a cache migration target with rapid mount and positioning characteristics in conjunction with other secondary-storage targets in a fast-mount cache configuration. For example, consider the following:

  • COPAN MAID and JBFS configurations are faster than physical tapes, but their storage size is finite

  • A physical tape library has an effectively unlimited storage capacity because you can eject full tapes and replace them with empty tapes, but recalling data from tape is slower than recalling data from COPAN MAID or JBFS configurations

The combination of these two targets in a fast-mount cache configuration results in faster recall performance for recently created offline files while also providing secure long-term storage.

How Fast-Mount Cache Differs from a DCM MSP

A fast-mount cache is similar to a DCM MSP in that both provide fast recall of migrated files in the cache tier (tier-2). However, they have following important differences:

  • DCM MSP:

    • Can be configured to downwardly migrate data from tier-2 to tier-3 as the data ages

    • Only requires that one initial copy be made, although two copies are recommended to prevent data loss (the copy in cache can be downwardly migrated to secondary storage on tier-3)

    • Deletes data from tier-2 on an individual file basis

    • Data on tier-2 may not be immediately recoverable when space is needed if the data does not already have a copy in tier-3 (causing a delay if space is needed quickly)

  • Fast-mount cache:

    • Does not downwardly migrate data from tier-2 to tier-3

    • Always requires that at least two initial copies be made (a copy to the cache and a copy to the secondary storage on tier-3)

    • Deletes data from tier-2 on a volume basis (that is, all files in the volume are deleted at the same time)

    • Tier-2 can be freed immediately when the free-space threshold is reached, without further operational effort


Note: SGI strongly recommends that you migrate at least two copies to secondary-storage targets in order to prevent file data loss in the event that a migrated copy is damaged. When using a fast-mount cache, SGI therefore recommends that you migrate at least three copies (one copy to the cache on tier-2 and two copies to secondary-storage targets at the tier-3 level).


Fast-Mount Cache Implementation

To implement a fast-mount cache, you must configure the DMF environment to make all secondary-storage copies of the data (tier-3 storage on other MSPs/VGs) at the same time as the cache copy (tier-2 storage on the MGs/VGs in the fast-mount cache).

You must also configure a task to empty the fast-mount cache when it reaches the configurable free-space threshold. DMF software immediately empties the oldest full volumes, defined as those with the oldest write dates. Because at least one copy of the data exists elsewhere (most likely on a physical tape), there is no need to wait for the data in the disk cache to migrate to a lower tier (unlike a DCM MSP). Therefore, the freeing of space on the fast-mount cache is very fast because it requires no movement of data.

Figure 1-10 and Figure 1-11 summarize the concepts of migrating and recalling file data in a fast-mount cache configuration using COPAN MAID as an example.

Also see “Use Fast-Mount Cache Appropriately” in Chapter 3.

Appropriate Use of Fast-Mount Cache

The fast-mount cache configuration is most appropriate for sites that have a high turnover of often-accessed data, where the most recently migrated files are also the most likely to be recalled.

All files on a volume being freed are deleted without regard to their size or last access time. That might mean that a file that is still being actively recalled on a fairly regular basis must be recalled from a VG with slower mount and position characteristics. You can minimize this issue by setting optional configuration parameters so that recently accessed files are copied to another volume within the fast-mount cache before any volumes are freed, using a separate scratch directory, but there may be an associated performance impact.

DMF Server Functions

The DMF server always provides the following services:

  • DMF administration (see “Administration Tasks”)

  • Backups

  • All I/O for data transfer to and from disks that is associated with cloud, FTP, disk, or DCM MSPs (see “How DMF Software Works”)

  • By default, a portion of I/O for data transfer to and from secondary storage (using its integrated data-mover functionality)

Parallel DMF Overview

The individual processes that migrate and recall data are known as data-mover processes. Nodes that run data-mover processes are data movers; this may include the DMF server node if it is configured to use the integrated data-mover functionality and, if you have purchased the Parallel Data-Mover Option, the parallel data-mover nodes. The DMF server and the parallel data-mover nodes can each run multiple data-mover processes.

As shown in Figure 1-13, the basic DMF product (that is, without the Parallel Data-Mover Option) runs data-mover processes on the DMF server. This allows the DMF control system to reside on a single server and minimizes the cost of a DMF implementation. Additional nodes can be installed with DMF client software (see “DMF Control from Client Platforms”).

Figure 1-14 shows the DMF product in a CXFS clustered filesystem environment.


Note: All nodes connect to a network. For simplicity, the network and DMF clients are not shown in the following figures.


Figure 1-13. Basic DMF Product in an NFS Environment

Basic DMF Product in an NFS Environment

Figure 1-14. Basic DMF Product in a CXFS Environment

Basic DMF Product in a CXFS Environment

For users with higher throughput requirements, Parallel DMF allows additional data movers to operate in parallel with the integrated data-mover functionality on the DMF server, increasing data throughput and enhancing resiliency.

The parallel data-mover node's dedicated function is to move data from the managed filesystem to volume-based media (COPAN MAID, COPAN VTL, JBFS configurations, or tape) back into the managed filesystem, using an LS. Offloading the majority of I/O from the integrated data-mover functionality on the DMF server improves I/O throughput performance.

Because multiple parallel data-mover nodes can be used to move data, DMF software can scale its I/O throughput capabilities. When one parallel data-mover node hits its peak throughput capabilities, you can add more parallel data-mover nodes to the configuration as needed to improve I/O performance. Each parallel data-mover node can improve overall DMF performance by up to its maximum performance. For example, if you have parallel data-mover nodes that each provide up to a 2-GB/s increase, then having a configuration with three of these parallel data-mover nodes would provide a net increase of up to 6 GB/s. Additional drives and filesystem bandwidth may be required to realize the benefit from additional parallel data-mover nodes.

The basic DMF product can run in an environment with or without CXFS. If DMF software is managing a CXFS filesystem, DMF software will ensure that the filesystem's CXFS metadata server is on the same machine as the DMF server and will use metadata server relocation if necessary to achieve that configuration (see “Configure DMF Appropriately with CXFS™” in Chapter 3). Parallel DMF must always run in a CXFS environment. The parallel data-mover nodes are SGI x86_64 machines that are installed with the SGI DMF Parallel Data Mover software package, which includes the required underlying CXFS software.


Note: From the CXFS cluster point of view, a DMF parallel data-mover node is a CXFS client-only node and therefore counts towards the total number of CXFS cluster nodes. However, the parallel data-mover nodes must be dedicated to DMF data-mover activities; they cannot perform any other functions that would be normal for CXFS client-only nodes.

The parallel data-mover node has specific hardware requirements and must access volume-based media on a port that is not used by CXFS. See “SAN Switch Zoning or Separate SAN Fabric Requirement”.

If you choose the Parallel DMF, you must use OpenVault for those drive groups (DGs) that contain drives on parallel data-mover nodes.

Figure 1-15 shows the concept of the DMF product using parallel data-mover nodes in a CXFS cluster with only one server-capable administration node. The parallel data-mover nodes only write data to secondary storage on volume-based media in an LS.

Figure 1-15. Parallel DMF in a CXFS Environment

Parallel DMF in a CXFS Environment

In a configuration with Parallel DMF, the DMF server still provides the services listed in “DMF Server Functions”.

For more information, see Chapter 8, “Parallel DMF Configuration”.

DMF Databases

The DMF daemon keeps track of migrated files in the daemon database. The key to each file is its bit-file identifier (BFID). For each migrated file, the daemon assigns a BFID that is stored in the file's inode. There is a daemon database record for each copy of a migrated file.

The daemon database also contains information such as the following:

  • The MSP/VG name

  • The MSP/VG key for each copy of a migrated file

When you use an MSP, the daemon database contains all of the information required to track a migrated file.

If you use an LS, there is also the LS database, which contains two tables of records:

  • Catalog (CAT) records track the location of migrated data on volumes. There is one CAT record for each migrated copy of a file. If a migrated copy is divided between multiple volumes, there will be a CAT record for each portion or chunk.

  • Volume (VOL) records contain information about the volumes. There is one VOL record for each volume.

Detailed information about the daemon and LS databases and their associated utilities is provided in “CAT Records” in Chapter 14 and “VOL Records” in Chapter 14.


Note: The databases consist of multiple files. However, these are not text files and cannot be updated by standard utility programs. See “Database Backups” in Chapter 19.

There are also databases for DMF Manager performance records and alerts.

For information about the OpenVault database, see OpenVault Administrator Guide for SGI InfiniteStorage.

Ensuring Data Integrity

DMF software provides capabilities to ensure the integrity of offline data. For example, you can have multiple MSPs/VGs with each managing its own pool of volumes. Therefore, you can configure the DMF environment to copy filesystem data to multiple offline locations.

DMF software stores data that originates in a CXFS or XFS filesystem. Each object stored corresponds to a file in the native filesystem. When a user deletes a file, the inode for that file is removed from the filesystem. Deleting a file that has been migrated begins the process of invalidating the offline image of that file. In the LS, this eventually creates a gap in the volume. To ensure effective use of media, the LS provides a mechanism for reclaiming space lost to invalid data. This process is called volume merging.

Much of the work done by DMF software involves transaction processing that is recorded in databases. The DMF databases provide for full transaction journaling and employ two-phase commit technology. The combination of these two features ensures that DMF software applies only whole transactions to its databases. Additionally, in the event of an unscheduled system interrupt, it is always possible to replay the database journals in order to restore consistency between the DMF databases and the filesystem. DMF utilities also allow you to verify the general integrity of the DMF databases themselves. See “Administration Tasks” for more information.

DMF Architecture

DMF software consists of the DMF daemon and one or more MSPs or LSs. The DMF daemon accepts requests to migrate filesystem data from the DMF administrator or from users. It also communicates with the operating system kernel to maintain a file's migration state in that file's inode.

The DMF daemon is responsible for dispensing a unique bit-file identifier (BFID) for each file that is migrated. The daemon also determines the destination of migration data and forms requests to the appropriate MSP/LS to make offline copies.

The MSP/LS accepts requests from the DMF daemon. For outbound data, the LS accrues requests until the amount of data justifies a volume mount. Requests for data retrieval are satisfied as they arrive. When multiple retrieval requests involve the same volume, all file data is retrieved in a single pass across the volume.

DMF software uses the DMAPI kernel interface defined by the Data Management Interface Group (DMIG). DMAPI is also supported by X/Open, where it is known as the XDSM standard.

Figure 1-16 illustrates the basic DMF architecture. Figure 1-17 shows the architecture of the LS.

Figure 1-16. Basic DMF Architecture

Basic DMF Architecture

Figure 1-17. LS Architecture

LS Architecture

There is one LS process (dmatls) per library, which maintains a database that all of its components share. The entities in the shaded boxes in Figure 1-17 are internal components of the dmatls process. Their functions are as follows:

Drive group (DG) 

The DG is responsible for the management of a group of interchangeable drives located in the library. These drives can be used by multiple VGs (see volume group below) and by other processes, such as backups and interactive users. However, in the latter cases, the DG has no management involvement; the mounting service (TMF or OpenVault) is responsible for ensuring that these possibly competing uses of the drives do not interfere with each other.

The main tasks of the DG are to:

  • Monitor I/O for errors

  • Attempt to classify the errors as volume, drive, or mounting service problems

  • Take preventive action

Volume group (VG) 

The VG holds at most one copy of a migrated file in a pool of volumes, of which it has exclusive use. It can use only the drives managed by a single DG.

Allocation group (AG) 

An AG is a pool of volumes that are transferred to a VG as needed and are returned to the pool when empty, subject to VG configuration parameters. Normally, an AG is configured to serve multiple VGs. Use of an AG is optional. When empty volumes are added to the DMF environment, they may be assigned to an AG via the dmvoladm(8) command.

Resource scheduler 

In a busy environment, it is common for the number of drives requested by VGs to exceed the number available. The purpose of the resource scheduler is to decide which VGs should have first access to drives as they become available and to advise the DG of the result. The DMF administrator can configure the resource scheduler to meet site requirements.

Standard resource scheduler algorithm 

This routine is an internal component of the dmatls process. Standard algorithms are provided with DMF software.

Resource watcher 

The resource watcher monitors the activity of the other components and frequently updates files that contain data of use to the administrator. These are usually HTML files viewable by a web browser, but can also be text files designed for use by awk or perl scripts.

The dmatrc and dmatwc processes are called the read children and write children. They are created by VGs to perform the actual reading and writing of volumes. Unlike most of the other DMF processes that run indefinitely, these processes are created as needed, and are terminated when their specific work has been completed.

Media transports and robotic automounters are also key components of all DMF installations. Generally, DMF software can be used with any transport and automounter that is supported by either OpenVault or TMF. Additionally, DMF software supports absolute block positioning , a media transport capability that allows rapid positioning to an absolute block address on the volume. When this capability is provided by the transport, positioning speed is often three times faster than that obtained when reading the volume to the specified position.

Migrate Groups

A migrate group (MG) is a logical collection of MSPs and VGs that you combine into a set in order to have a single destination for a migrate request. A migration request to the MG will result in the copying of the file to exactly one MSP/VG that is a member of the MG.

You define an MG by adding the migrategroup object to the DMF configuration file. You can use the defined name of the MG in DMF policies and commands, similar to the way in which you use the names of VGs/MSPs. See:

DMF Capacity

The capacity of the DMF environment is measured in several ways, as follows:

  • Total number of files. The daemon database can contain approximately 4 billion entries, and there is one database entry for each copy of a file that DMF software manages. Therefore, if there are two copies of each managed file, DMF software can theoretically manage approximately 2 billion files. The number of files that can be supported with best performance will vary depending upon the workload.

  • Total amount of data. The capacity is limited only by the amount of secondary storage available to DMF software.

  • Total amount of data moved between online and offline media. The number of drives configured for the DMF environment, the number of tape channels, and the number of disk channels all figure highly in the effective bandwidth. In general, DMF software provides full-channel performance to both tape and disk.

  • File size. DMF software can support any file that can be created on the CXFS or XFS filesystem being managed.

DMF software has evolved in production-oriented, customer environments. It is designed to make full use of parallel and asynchronous operations, and to consume minimal system overhead while it executes, even in busy environments in which files are constantly moving online or offline. Exceptions to this rule will occasionally occur during infrequent maintenance operations when a full scan of filesystems or databases is performed.

For information about the DMF capacity license, see Chapter 2, “DMF Licensing”.

Requirements


Note: See the InfiniteStorage Software Platform (ISSP) release note and the DMF release note for the supported kernels, update levels, service pack levels, software versions, libraries, and tape devices.

This section discusses the following:

DMF Administrative and Store Directories

The DMF server uses the DMF administrative and store directories to store its databases, log files, journal files, and temporary files. Table 1-1 summaries the configuration parameters used to define these directories, the variable that represents the value of the parameter in this guide, and the purpose of the directories. For configuration details, see Chapter 7, “DMF Configuration File”.

Table 1-1. DMF Administrative and Store Directories

Configuration Parameter

Variable that Represents the Parameter Value

Example Value

Purpose

HOME_DIR

HOME_DIR

/dmf/home

Specifies the base pathname for directories in which the DMF daemon database, library server (LS) database, and related files reside.

Minimum permission requirement: 711

SPOOL_DIR

SPOOL_DIR

/dmf/spool

Specifies the base pathname for directories in which DMF log files are kept.

Minimum permission requirement: 711

JOURNAL_DIR

JOURNAL_DIR

/dmf/journals

Specifies the base pathname for directories in which the journal files for the daemon database and LS database will be written.

Minimum permission requirement: 711

TMP_DIR

TMP_DIR

/dmf/tmp

Specifies the base pathname for directories in which DMF puts temporary files for its own internal use.

Minimum permission requirement: 711

DATABASE_COPIES

DATABASE_COPIES

/dir1/database_copies and /dir2/database_copies

Specifies one or more directories into which the run_copy_databases.sh task will place a copy of the DMF databases.

DUMP_DESTINATION for integrated backups

DUMP_DESTINATION

/dmf/backups

Specifies the location in which to store backups and must be a dedicated filesystem mount point

DUMP_DESTINATION for disk backups (nonintegrated)

DUMP_DESTINATION

/dmf/backups

Specifies the location in which to store backups.

STORE_DIRECTORY for a DCM MSP

STORE_DIRECTORY

/dmf/dcmmspname_store

Specifies the directory that is holds files for a DCM MSP. There is one STORE_DIRECTORY parameter for each DCM MSP.

STORE_DIRECTORY for a disk MSP

STORE_DIRECTORY

/dmf/diskmspname_store

Specifies the directory that holds files for a disk MSP (there is one STORE_DIRECTORY parameter for each disk MSP).

CACHE_DIR

CACHE_DIR

/dmf/cache

(Optional) Specifies the directory in which the VG stores chunks while merging them from sparse volumes.

MOVE_FS

MOVE_FS

/move

(Optional) Specifies one or more scratch directories that are used by dmmove(8) to move files between media-specific processes (MSPs) or volume groups (VGs). You must specify a value for MOVE_FS if you intend to use the dmmove command. The best practice when using MOVE_FS is for it to be dedicated to the dmmove function.

For more information about DMF administrative directories, see “DMF Administrative and Store Directories”.

PostgreSQL Database Server Requirements

The PostgreSQL database is required by DMF for various purposes, including the queue-viewing tools:

  • DMF requires the 9.3.X version of PostgreSQL. See “Apply Appropriate PostgreSQL Updates” in Chapter 3.

  • The PostgreSQL database server is the DMF server

  • The PostgreSQL database server depends upon the ident service. See “Overview of the Installation and Configuration Steps” in Chapter 5.

  • The PostgreSQL database server runs as the postgres user. The postgres user must have at least 711 permission to the following directories:

    • HOME_DIR/pg_data

    • JOURNAL_DIR /pg_xlog

    • TMP_DIR /pg_tmp

    • SPOOL_DIR /pglogs

  • The firewall must allow access to port 5432 for localhost

  • SLES: The postgres user must be set to use a shell in /etc/password. (By default, the postgres user's shell is set to /bin/false.)

For configuration parameters specific to Postgres, see “base Object Parameters” in Chapter 7.

For more information about DMF administrative directories, see “DMF Administrative and Store Directories”.

Server Node Requirements

A DMF server node requires the following:

  • SGI x86_64 hardware

  • One of the following operating systems as documented in the ISSP release note:

    • Red Hat Enterprise Linux (RHEL)

    • SUSE Linux Enterprise Server (SLES)

  • DMF server software and associated products distributed with the ISSP release

Parallel Data-Mover Node Requirements

DMF parallel data-mover nodes require the following:

  • SGI x86_64 hardware

  • Same operating system as the DMF server and CXFS metadata server

  • DMF parallel data-mover node software (which includes the required underlying CXFS client-only software)

If you use Parallel DMF, you must use OpenVault for those DGs that contain drives on parallel data-mover nodes. See “Parallel DMF Overview”.

Mounting Service Requirements

OpenVault requires ksh, not pdksh.

TMF has no requirements specific to DMF software.

License Requirements

DMF software is licensed. See Chapter 2, “DMF Licensing”.

DMAPI Requirement

For filesystems to be managed by DMF software, they must be mounted with the DMAPI interface enabled. See “Install DMAPI” in Chapter 3.

SAN Switch Zoning or Separate SAN Fabric Requirement

Drives must be visible only from the active DMF server, the passive DMF server (if applicable), and the parallel data-mover nodes. The drives must not be visible to any other nodes. You must use one of the following:

  • Independent switches (in a separate SAN fabric)

  • Independent switch zones for CXFS/XVM volume paths and DMF drive paths


Warning: If the drives are visible to any other nodes, such as CXFS client-only nodes (other than those that are dedicated to being parallel data-mover nodes), data can become corrupted or overwritten.

DMF software requires independent paths to drives so that they are not fenced by CXFS. The ports for the drive paths on the switch must be masked from fencing in a CXFS configuration.

XVM must not fail over CXFS filesystem I/O to the paths visible through the tape/disk HBA ports when Fibre Channel port fencing occurs.

DMF Manager Requirements

DMF Manager has the following requirements:

  • The DMF Manager software is installed on the DMF server node.

  • One of the following web browsers:

    • Firefox 3.6 and later (Firefox is the preferred browser)

    • Internet Explorer versions supported under Windows 7 (ensure that the latest security patches are installed)


    Note: DMF Manager might also work other browsers, but its functionality is not tested.


  • Before saving or applying configuration changes, you must make and mount the filesystems used for the DMF administrative directories. See “Configure Filesystems and Directories Appropriately for DMF” in Chapter 3.

DMF SOAP Requirements

To use the DMF SOAP service capability, the software must be installed on the DMF server node.

DMF Direct Archiving Requirements

DMF direct archiving has the following requirements:

  • The archive filesystem must be visible and mounted in the same location on the DMF server and any DMF parallel data-mover nodes. (The DMF server need not be the server of the archive filesystem; for example, the DMF server need not be the Lustre server.)

  • The archive filesystem must be visible to DMF clients from which you want to run the dmarchive(1) command, but may have the filesystem mounted on a different mount point.

  • The archive filesystem must be mounted on the DMF server and any DMF parallel data-mover nodes so that the root user is able to access the filesystem with root privileges (that is, with root squashing disabled).

  • The archive filesystem must be fast enough to permit efficient streaming to/from secondary storage. If this is not the case, the speed could be so slow as to render DMF software useless; in that situation, copying the file to a managed filesystem via cp(1) and migrating the file may be a better option.

If a filesystem does not meet these requirements, do not add it to the DMF configuration file as an archive filesystem.

Fast-Mount Cache Requirements

The fast-mount cache feature requires the following at a minimum:

  • Migrating at least two copies simultaneously, one copy to the cache (such as COPAN MAID) and at least one copy to a secondary-storage target (such as physical tape).

  • Configuring a task to empty the cache.

However, SGI always recommends that you migrate at least two copies to secondary-storage targets in order to prevent file data loss in the event that a migrated copy is damaged. When using a fast-mount cache, SGI therefore recommends that you migrate at least three copies (one to the cache and two to secondary-storage targets).

See “Use Fast-Mount Cache Appropriately” in Chapter 3.

Cloud Storage Requirements

DMF software supports the following cloud systems as secondary storage:

  • Scality RING private cloud

  • Amazon Simple Storage Service (S3) public cloud


    Note: Amazon Glacier is not supported.


  • Other products that present a service interface that is compatible with S3, in a private cloud

SGI strongly recommends that you migrate at least two copies to secondary-storage targets in order to prevent file data loss in the event that a migrated copy is damaged. A given cloud can be a single point of failure, therefore redundant copies within one cloud do not sufficiently protect against data loss. SGI therefore highly recommends that you migrate data to a second location (to another cloud instance, tape, or disk).

Mediaflux Environment Requirements

See Arcitecta's Mediaflux documentation for information about using DMF in a Mediaflux environment. In particular, there are certain requirements for configuring DMF and for using dmaudit(8) when restoring files in a managed filesystem.

Administration Tasks

This section discusses the following aspects of DMF administration:

Initial Planning

DMF software manages two primary resources:

  • Free space on managed filesystems

  • Pools of secondary-storage media

You can configure those resources in a variety of environments, including the following:

  • Support of interactive processing in a general-purpose environment with limited disk space

  • Dedicated fileservers

  • Lights-out operations

You must do the following:

  • Evaluate the environment in which DMF software will run.

  • Plan for a certain capacity, both in the number of files and in the amount of data

  • Estimate the rate at which you will be moving data between the DMF store of data and the native filesystem

  • Select autoloaders and media transports that are suitable for the data volume and delivery rates you anticipate

Installation and Configuration

You will install the DMF server software (which includes the software for TMF and OpenVault) from the ISSP media.

To configure the DMF environment, you must define a set of parameters in the DMF configuration file, typically by using a sample file as a starting point. See:

To make site-specific modifications, see “Customizing DMF” in Chapter 5.

For a detailed example of configuring using COPAN cabinets, see:

  • COPAN MAID for DMF Quick Start Guide

  • SGI 400 VTL for DMF Quick Start Guide

Recurring Administrative Duties

DMF software requires that you perform recurring administrative duties in the following areas:


Note: You can use tasks that automate these duties. A task is a process initiated on a time schedule that you determine, similar to a cron(1) job. Tasks are defined with configuration file parameters and are described in detail in “taskgroup Object” in Chapter 7 and “LS Tasks” in Chapter 7.


Free-Space Management

You must decide how much free space to maintain on each managed filesystem. DMF software has the ability to monitor filesystem capacity and to initiate file migration and the freeing of space when free space falls below the prescribed thresholds. See Chapter 11, “Automated Space Management”.

File Ranking

You must decide which files are most important as migration candidates. When DMF software migrates and frees files, it selects files based on criteria you chose. The ordered list of files is called the candidate list. Whenever DMF software responds to a critical space threshold, it builds a new migration candidate list for the filesystem that reached the threshold. See “Generating the Candidate List” in Chapter 11.

Offline Data Management

DMF software offers the ability to migrate data to multiple locations. Each location is managed by a separate MSP/VG and is usually constrained to a specific type of medium.

Complex strategies are possible when using multiple MSPs, LSs, or VGs. For example, short files can be migrated to a device with rapid mount times, while long files can be routed to a device with extremely high density.

You can describe criteria for MSP/VG selection. When setting up a VG, you assign a pool of volumes for use by that VG. The dmvoladm(8) utility provides management of the VG media pools.

You can configure DMF software to automatically merge volumes that are becoming sparse. With this configuration (using the run_merge_tapes.sh task for either disk or tape), the media pool is merged on a regular basis in order to reclaim unusable space.

Recording media eventually becomes unreliable. Sometimes, media transports become misaligned so that a volume written on one cannot be read from another. The following utilities support management of failing media:

  • dmatread(8) recovers data

  • dmatsnf(8) verifies LS volume integrity

Additionally, the volume merge process built into the LS is capable of effectively recovering data from failed media.

Chapter 14, “Library Servers and Media-Specific Processes”, provides more information on administration.

Data Integrity and Reliability

This section discusses the following things that you must do maintain the integrity and reliability of data managed by DMF software:

Run Backups

DMF software moves only the data associated with files, not the file inodes or directories, so you must still run filesystem backups in order to preserve the metadata associated with migrated files and their directories. You can configure DMF software to automatically run backups of your managed filesystems. See “Back Up Migrated Filesystems and DMF Databases” in Chapter 3.

The xfsdump(8) and xfsrestore(8) utilities are aware of migrated files. The xfsdump utility can be configured to dump the data blocks for a file only if it has not yet been migrated. Files that are dual-state, partial-state, or offline have only their inodes backed up.

You can establish a policy of migrating 100% of the files in the managed filesystems before starting a backup, thereby leaving only a small amount of data that must be dumped. This practice can greatly increase the availability of the machine on which DMF software is running because, generally, backup commands must be executed in a quiet environment.

You can configure the run_full_dump.sh and run_partial_dump.sh tasks to ensure that all files have been migrated. These tasks can be configured to run when the environment is quiet.

See Chapter 4, “Backups and DMF”.

Audit the Databases and Log files

Configure DMF software to automatically run dmaudit to examine the consistency and integrity of the databases it uses. DMF databases record all information about stored data. The DMF databases must be synchronized with the filesystems that DMF software manages. Much of the work done by DMF software ensures that the DMF databases remain aligned with the filesystems.

Protect Databases from Loss

You can configure DMF software to periodically copy the databases to other devices on the system to protect them from loss (using the run_copy_databases.sh task). This task also uses the dmdbcheck utility to ensure the integrity of the databases before saving them.

Remove Old Logs and Journals

DMF software uses journal files to record database transactions. Journals can be replayed in the event of an unscheduled system interrupt that causes database corruption. You must ensure that journals are retained in a safe place until a full backup of the DMF databases can be performed.

You can configure the run_remove_logs.sh and run_remove_journals.sh tasks to automatically remove old logs and journals, which will prevent the DMF SPOOL_DIR and JOURNAL_DIR directories from overflowing.

Hard-Delete Database Entries

You can configure the run_hard_deletes.sh task to automatically remove database entries whose files will never be restored from backup media. See “Cleaning Up Obsolete Database Entries” in Chapter 15.

Commands Overview

The DMF administrator has access to a wide variety of commands for controlling the DMF environment. This section discusses the following:


Note: The functionality of some of these commands can be affected by site-defined policies; see “Customizing DMF” in Chapter 5.

The FTP MSP uses no special commands, utilities, or databases.

User Commands

End users can run the following commands on DMF clients to affect the manual storing and retrieval of their data:

Command

Description

dmarchive(1)

Directly copies data between DMF secondary storage and a POSIX filesystem that is not managed by DMF software, such as Lustre. It is intended to streamline a work flow in which users work in an archive filesystem and later want to archive a copy of their data via DMF software. For more information about the MIN_ARCHIVE_SIZE parameter, see “filesystem Object Parameters” in Chapter 7.

dmattr(1)

Displays whether files are migrated or not by returning a specified set of DMF attributes (for use in shell scripts).

dmcapacity(1)

Displays an estimate of the remaining storage capacity for each VG in each LS. You can optionally choose to report the data formatted into XML or HTML.

dmcopy(1)

Copies all or part of the data from a migrated file to an online file.

dmdu(1)

Displays the number of blocks contained in specified files and directories on a managed filesystem.

dmfind(1)

Displays whether files are migrated or not by searching through files in a directory hierarchy.

dmget(1)

Recalls the specified files.

dmls(1)

Displays whether files are migrated or not by listing the contents of a directory.

dmoper(1)

Displays outstanding requests for operator intervention.

dmput(1)

Migrates the specified files.

dmtag(1)

Allows a site-assigned 32-bit integer to be associated with a specific file (which can be tested in the when clause of particular configuration parameters and in site-defined policies).

dmversion(1)

Displays the version number of the currently installed DMF software.

The DMF libdmfusr.so user library lets you write your own site-defined DMF user commands that use the same application program interface (API) as the above DMF user commands. See Appendix B, “DMF User Library libdmfusr.so”.

Also see Chapter 16, “DMF SOAP Server”.

Licensing Commands

The following commands help you to manage DMF licenses:

Command

Description

dmusage(8)

Displays information about the capacity allowed by the DMF licenses and the amount of data that DMF software is currently managing against those licenses.

dmflicense(8)

Prints DMF license information.

Configuration Commands

The DMF configuration file (/etc/dmf/dmf.conf ) contains configuration objects and associated configuration parameters that control the way DMF software operates. By changing the values associated with these objects and parameters, you can control the behavior of DMF software. To modify the configuration file, you can use DMF manager. For information about configuration, see:

The following man pages are also related to the configuration file:

Man page

Description

dmf.conf(5)

Describes the DMF configuration objects and parameters in detail.

dmconfig(8)

Prints DMF configuration parameters to standard output.

For detailed examples of configuring using COPAN cabinets, see:

  • COPAN MAID for DMF Quick Start Guide

  • SGI 400 VTL for DMF Quick Start Guide

DMF Daemon and Related Commands

The DMF daemon, dmfdaemon(8), communicates with the kernel through a device driver and receives backup and recall requests from users through a socket. The daemon activates the appropriate MSPs and LSs for file migration and recall, maintaining communication with them through unnamed pipes. It also changes the state of inodes as they pass through each phase of the migration and recall process. In addition, the daemon maintains a database containing entries for every migrated file on the system. Updates to database entries are logged in a journal file for recovery. See Chapter 12, “The DMF Daemon”, for a detailed description of the DMF daemon.


Caution: If used improperly, commands that make changes to the daemon database can cause data to be lost.

The following administrator commands are related to dmfdaemon and the daemon database:

Command

Description

dmaudit(8)

Reports discrepancies between filesystems and the daemon database. This command is executed automatically if you configure the run_audit.sh task.

dmcheck(8)

Checks the DMF installation and configuration and reports any problems.

dmdadm(8)

Performs daemon database administrative functions, such as viewing individual database records.

dmdbcheck(8)

Checks the consistency of a database by validating the location and key values associated with each record and key in the data and key files (also an LS command). If you configure the run_copy_database.sh task, this command is executed automatically as part of the task. The consistency check is completed before the DMF databases are saved.


dmdbrecover(8)

Applies journal records to a restored backup copy of the daemon database or LS database in order to create an up-to-date sane database.

dmdidle(8)

Causes files in pending requests to be flushed to secondary storage, even if this means forcing only a small amount of data to a volume.

dmdstat(8)

Indicates to the caller the current status of dmfdaemon.

dmdstop(8)

Stops the DMF daemon without consideration for related services.


Note: To stop DMF, you should normally use the service(8) command. See “Starting and Stopping the DMF Environment” in Chapter 5.

For instructions about stopping DMF in an HA environment, see High Availability Guide for SGI InfiniteStorage.


dmfdaemon(8)

Starts the DMF daemon without consideration for related services.


Note: To start DMF, you should normally use the service(8) command. See “Starting and Stopping the DMF Environment” in Chapter 5.

For instructions about starting DMF in an HA environment, see High Availability Guide for SGI InfiniteStorage.


dmhdelete(8)

Deletes expired daemon database entries and releases corresponding MSP/VG space, resulting in logically less active data. This command is executed automatically if you configure the run_hard_deletes.sh task.

dmmigrate(8)

Migrates regular files that match specified criteria in the specified filesystems, leaving them as dual-state. This utility is often used to migrate files before running backups of a filesystem, hence minimizing the size of the backup image. It may also be used in a DCM MSP environment to force cache files to be copied to secondary storage if necessary.

dmsnap(8)

Copies the daemon database and the LS database to a specified location. If you configure the run_copy_database.sh task, this command is executed automatically as part of the task.

Space Management Commands

The following commands are associated with automated space management, which allows DMF software to maintain a specified level of free space on a filesystem through automatic file migration:

Command

Description

dmfsfree(8)

Attempts to bring the free space and migrated space of a filesystem into compliance with configured values.

dmfsmon(8)

Monitors the free space levels in filesystems configured with automated space management enabled (auto) and lets you maintain a specified level of free space.

dmscanfs(8)

Scans DMF filesystems or DCM MSP caches and prints status information to stdout.

See Chapter 11, “Automated Space Management”, for details.

LS Commands

The following commands manage the CAT and VOL records for the LS:

Command

Description

dmcatadm(8)

Provides maintenance and recovery services for the CAT records in the LS database.

dmvoladm(8)

Provides maintenance and recovery services for the VOL records in the LS database, including the selection of volumes for merge operations.

Most data transfers to and from secondary storage are performed by components internal to the LS. However, the following commands can read LS volumes directly:

Command

Description

dmatread(8)

Copies data directly from LS volumes to disk.

dmatsnf(8)

Audits and verifies the format of LS volumes.

The following commands check for inconsistencies in the LS database:

Command

Description

dmatvfy(8)

Verifies the contents of the LS database against the daemon database. This command is executed automatically if you configure the run_audit.sh task.

dmdbcheck(8)

Checks the consistency of a database by validating the location and key values associated with each record and key in the data and key files.

DCM MSP Commands

The following commands support the DCM MSP:

Command

Description

dmdskfree(8)

Manages file space within the disk cache and as needed migrates files to a lower tier and/or removes them from the disk cache.

dmdskvfy(8)

Verifies disk MSP file copies against the daemon database.

Disk MSP Command

The following command supports the disk MSP:

Command

Description

dmdskvfy(8)

Verifies disk MSP file copies against the daemon database.

Other Commands

The following commands are also available:

Command

Description

dmcancel(8)

Cancels some types of DMF requests by request ID.

dmclripc(8)

Frees system interprocess communication (IPC) resources and token files used by dmlockmgr and its clients when abnormal termination prevents orderly exit processing.

dmcollect(8)

Collects relevant details for problem analysis when DMF software is not functioning properly. You should run this command before submitting a bug report to SGI Support, should this ever be necessary.

dmcopan(8)

Provides detail about a COPAN MAID volume serial number (VSN) and its associated metadata.

dmdate(8)

Performs calculations on dates for administrative support scripts.

dmdump(8)

Creates a text copy of an inactive database file or a text copy of an inactive complete daemon database.


dmdumpj(8)

Creates a text copy of DMF journal transactions.

dmfill(8)

Recalls migrated files to fill a percentage of a filesystem. This command is mainly used in conjunction with backup and restore commands to return a corrupted filesystem to a previously known valid state.

dmlockmgr(8)

Invokes the database lock manager. The lock manager is an independent process that communicates with all applications that use the DMF databases, mediates record lock requests, and facilitates the automatic transaction recovery mechanism.

dmmove(8)

Moves copies of a migrated file's data to the specified MSPs/VGs.

dmmvtree(8)

Moves files from one managed filesystem to another without requiring that file data be recalled.

dmov_keyfile(8)

Creates the file of DMF OpenVault keys, ensuring that the contents of the file are semantically correct and have the correct file permissions. This command removes any DMF keys in the file for the OpenVault server system and adds new keys at the front of the file.

dmov_loadtapes(8)

Scans a library for volumes not imported into the OpenVault database and allows the user to select a portion of them to be used by a VG. The selected volumes are imported into the OpenVault database, assigned to the DMF application, and added to the LS database. This command can perform the equivalent actions for the filesystem backup scripts; just use the name of the associated task group instead of the name of a VG.

dmov_makecarts(8)

Makes the volumes in one or more LS databases accessible through OpenVault by importing into the OpenVault database any volumes unknown to it and by registering all volumes to the DMF application not yet so assigned. This command can perform the equivalent actions for the filesystem backup scripts; just use the name of the associated task group instead of the name of a VG.

dmprojid(8)

Sets or displays a file's site-defined project ID.

dmqview(8)

Display information about the internal DMF processing queues

dmrepri(8)

Reprioritizes recall and dmcopy request by daemon request ID.

dmrestore(8)

Restores DMF attributes to a file.


Caution: This command is intended to be used in certain recovery scenarios; used incorrectly, it could cause DMF file-consistency issues.You should use this command only at the direction of SGI Support.


dmselect(8)

Selects migrated files based on given criteria. The output of this command can be used as input to dmmove(8).

dmsort(8)

Sorts files of blocked records.

dmstat(8)

Displays a variety of status information about the DMF environment, including details about the requests currently being processed by the daemon, statistics about requests that have been processed since the daemon last started, and details of current drive usage by VGs.

dmtapestat(8)

Displays drive metrics for the entire DMF installation. You execute this command as root from the DMF server.

dmunput(8)

Recalls files and removes them from DMF management. You execute this command as root.

dmxfsrestore(8)

Calls the xfsrestore(8) command to restore files backed up to volumes that were produced by DMF administrative maintenance scripts.

sgi-ltfs

Manages Linear Tape File System (LTFS) cartridges and drives.

tsreport(8)

Displays information about tape drive errors, alerts, and usage when the ts tape driver is used. The tsreport command is included in the apd RPM.



[1] For historical reasons, these volumes are sometimes referred to as tapes in command output and documentation.