Chapter 2. Planning an XFS Filesystem

Chapter 2. Planning an XFS Filesystem
Prev		Next

This chapter discusses the following:

Choosing the Filesystem Block Size

XFS lets you choose the logical block size for each filesystem by using the -b size= option of the mkfs.xfs command. (Physical disk blocks remain 512 bytes.)

For XFS filesystems on disk partitions and logical volumes and for the data subvolume of filesystems on logical volumes, the block size guidelines are as follows:

The minimum block size is 512 bytes. Small block sizes increase allocation overhead which decreases filesystem performance. In general, the recommended block size for filesystems under 100 MB and for filesystems with many small files is 512 bytes. The filesystem block size must be a power of two.
The default block size is 4096 bytes (4 KB). This is the recommended block size for filesystems over 100 MB.
The maximum block size is the page size of the kernel, which is 4 KB on x86 systems (both 32-bit and 64-bit) and is configurable on ia64 systems. Because large block sizes can waste space, in general block sizes should not be larger than 4096 bytes (4 KB).

Block sizes are specified in bytes as follows:

Decimal (default)
Octal (prefixed by 0)
Hexadecimal (prefixed by 0x or 0X)

If the number has the suffix “K” it is multiplied by 1024.

Choosing the Filesystem Directory Block Size

To select a logical block size for the filesystem directory that is greater than the logical block size of the filesystem, use the -n option of the mkfs.xfs command. This lets you choose a filesystem block size to match the distribution of data file sizes without adversely affecting directory operation performance. Using this option could improve performance for a filesystem with many small files, such as a news or mail filesystem. In this case, the filesystem logical block size could be small (512 bytes, 1 KB, or 2 KB) and the logical block size for the filesystem directory could be large (4 KB or 8 KB); this can improve the performance of directory lookups because the tree storing the index information has larger blocks and less depth.

You should consider setting a logical block size for a filesystem directory that is greater than the logical block size for the filesystem if you are supporting an application that reads directories (with the readdir(3C) or getdents(2) system calls) many times in relation to how much it creates and removes files. Using a small filesystem block size saves on disk space and on I/O throughput for the small files.

The data needed to perform a readdir operation is segregated from the index information. Directory data blocks can be “read-ahead” in a readdir. Performing read-ahead improves the readdir performance dramatically. Because the data needed for a readdir operation and index information are separate in a directory block, the offset in a directory is limited to 32 bits.

Choosing the Log Type and Size

This section discusses the following:

Log Type: Internal vs External

Each XFS filesystem has a log that contains filesystem journaling records. There are two types of logs:

Log Type

Description

Internal

Maintains log records in approximately the center of the disk partition or data subvolume. The chosen starting point is the allocation group (AG) closest to the center of the filesystem, rounding up if necessary. For example, if there are 33 AGs numbered AG0 through AG32, then the 17th AG (AG16) is chosen; in the case of 32 AGs, AG16 is still chosen due to rounding.

Note: When using the ibound mount option, the chosen AG will still be the middle of the filesystem, not the middle of the user-extents region. However, the log will always exist within the user-extents region. See “ibound Extent Allocation Policy” in Chapter 7.

External

Maintains log records that in a dedicated log subvolume. You should create an external log in any of the following circumstances:

The data and log records should be on different partitions
The data and the log subvolume of a logical volume should be on different partitions or should use different subvolume configurations
The log subvolume of a logical volume should be striped independently from the data subvolume

Log Size

The maximum log size for either an internal log or an external log is 2,136,997,888 bytes (that is, 10 MB less than 2 GB), which equates to 521728 4-KB blocks. In addition, the size of an internal log cannot be larger than the AG size.

For most filesystems, SGI recommends the default log size:

For an internal log, the default log size depends on the filesystem size, filesystem block size, and filesystem directory block size. The default ranges from 512 filesystem blocks up to the maximum log size.
For an external log, the default log size is the entire size of the specified log device, up to the maximum log size. You should create a volume or partition of the desired size prior to creating the filesystems and then let mkfs determine the size of the external log.

Note: Although it is possible to explicitly set the size by using by the mkfs command, it is much less reliable.

For a filesystem with very high transaction activity, SGI recommends using the maximum log size.

Note the following:

The larger the log, the more outstanding transactions that XFS can support.
Using the maximum log size can increase the filesystem mount time after a crash.
The amount of disk space required for log records is proportional to the transaction rate and the size of transactions on the filesystem, not the size of the filesystem. Larger block sizes result in larger transactions.
Transactions from directory updates (for example, the mkdir and rmdir commands and the create() and unlink() system calls) cause more log data to be generated.
The disk space dedicated to the log does not show up in listings from the df command, nor can you access it with a filename.

`mkfs.xfs` Command-Line Options for Logs

At the mkfs.xfs command line, include the following options according to the circumstances:

Internal log:
- Default size: no special options are required
- Maximum size:
  -l size=521728b
External log, where device is the location of the external log subvolume:
-l logdev=device

For more details, see the mkfs.xfs(8) man page.

Choosing Allocation Groups and Stripe Units

If you are using the ibound¸ mount option available with enhanced XFS, the first set of AGs (determined by the ibound value) are designated as the metadata region and the remaining AGs are designated as the user-extents region. SGI recommends that the metadata region consists of at least 8 AGs. (See “ibound Mount Option for SSD Media” in Chapter 7).

You can select the number of AGs when you create an XFS filesystem or, alternatively, you can select the size of an AG. The larger the number of AGs, the more parallelism can be achieved when allocating blocks and inodes. You should avoid selecting a very large number of AGs or an AG size that will yield a very large number of AGs; a large number of AGs causes an unreasonable amount of CPU time to be used when the filesystem is close to full.

The minimum AG size is 16 MB; the maximum size is just under 4 GB.

The default number of AGs is 8, unless the filesystem is smaller than 128 MB or larger than 8 GB. When the filesystem is smaller than 128 MB, the default number of AGs is fewer than 8, since the minimum AG size is 16 MB. In this case, the data section, by default, will be divided into as many AGs as possible that are at least 16 MB. When the filesystem is larger than 8 GB, but smaller than 64 GB, the default number of AGs is greater than 8, with each AG approximately 1 GB in size. When the filesystem is larger than 64 GB, the default number of AGs is still greater than 8, but the AG size is 4 GB.

XFS lets you select the stripe unit for a RAID device or stripe volume. This ensures that data allocations, inode allocations, and the internal log will be aligned along stripe units when the end-of-file is extended and the file size is larger than 512 KB. You specify stripe units in 512-byte block units or in bytes. See the mkfs.xfs(1M) man page for information on specifying stripe units.

When you specify a stripe unit, you also specify a stripe width in 512-byte block units or in bytes. The stripe width must be a multiple of the stripe unit. The stripe width will be the preferred I/O size returned in the stat() system call. See the mkfs.xfs(8) man page for information on specifying stripe width.

When used in conjunction with the -b (block size) option of the mkfs.xfs command, you can use the -d su= and -d sw= options to specify the stripe unit and stripe width, respectively, in filesystem blocks.

For a RAID device, the default stripe unit is 0, indicating that the feature is disabled. You should configure the stripe unit and width sizes of RAID devices in order to avoid unexpected performance anomalies caused by the filesystem doing non-optimal I/O operations to the RAID unit. For example, if a block write is not aligned on a RAID stripe unit boundary and is not a full stripe unit, the RAID will be forced to do a read/modify/write cycle to write the data. This can have a significant performance impact. By setting the stripe unit size properly, XFS will avoid unaligned accesses.

For a striped volume, the stripe unit that was specified when the volume was created is provided by default.

Repartitioning the Disks

Many system administrators may find that they want or need to repartition disks when they switch to XFS filesystems and/or logical volumes. Some of the reasons to consider repartitioning are:

Repartitioning can result in a larger pool of free space for all of the formerly separate filesystems
If you plan to use logical volumes, you may want to put the XFS log into a small subvolume. This requires disk repartitioning to create a small partition for the log subvolume.
If you plan to use logical volumes, you may want to repartition to create disk partitions of equal size that can be striped or plexed.

Prev	Table of Contents	Next
Chapter 1. The XFS® Filesystem		Chapter 3. Creating XFS Filesystems