Appendix E. System Tunable Parameters

This appendix discusses the following:

SGI recommends that you use the same settings on all applicable nodes in the cluster.


Note: Before changing any parameter, you should understand the ramifications of doing so on your system. Contact your SGI support person for guidance.

To manipulate these parameters on a running system, you can use the Linux sysctl command or the IRIX systune command. For more information, see the sysctl(1M), systune(1M), and modules.conf(5) man pages.

Linux organizes the tunables in a hierarchy, therefore you must specify the entire "path" to the tunable. The first part of that path is given under the "Location" entry in the following sections. For example, the full path to the tunable cxfsd_sync_force is fs.cxfs.cxfsd_sync_force.

Example of a query using sysctl:

server-admin# sysctl fs.cxfs.cxfsd_sync_force
fs.cxfs.cxfsd_sync_force = 8372224

Example of setting a value using sysctl:

server-admin# sysctl fs.cxfs.cxfsd_sync_force=0
fs.cxfs.cxfsd_sync_force = 0


Note: There cannot be spaces around the = character when setting a value.

See also “Set System Tunable Parameters Appropriately” in Chapter 2.

Static Parameters that are Site-Configurable

Static parameters require a reboot to take affect. On IRIX, you must build and boot new kernels, which happens automatically during a normal boot process. On any of the Linux flavors supported in this CXFS release, you must specify the parameter in /etc/modprobe.conf.local.

mtcp_hb_local_options

Specifies how CXFS kernel heartbeat is generated for a Linux node. You should only change this value at the recommendation of SGI support.

Legal values:

  • 0x0 uses the standard heartbeat generation routine (default).

  • 0x1 uses the interrupt timer list instead of a kernel thread.

  • 0x3 uses a heartbeat generation routine that avoids some memory allocation problems that may occur on nodes with large CPU counts that run massively parallel jobs.

Location:

  • Linux: kernel.cell (sgi-cell module)

mtcp_hb_period

Specifies (in hundredths of a second) the length of time that CXFS waits for CXFS kernel heartbeat from other nodes before declaring node failure. SGI recommends a value of 500 (5 seconds). You should only change this value at the recommendation of SGI support. The same value must be used on all nodes in the cluster.

Range of values:

  • Default: 500

  • Minimum: 100

  • Maximum: 12000


Note: If your cluster includes large Altix systems (greater than 64 processors) , you may want to use a larger value, such as 6000 (60 seconds) or 12000 (120 seconds). However, the larger the timeout, the longer it takes the cluster to recognize a failed node and start recovery of the shared resources granted to that node. See “Avoid CXFS Kernel Heartbeat Issues on Large SGI Altix Systems” in Chapter 2.

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

mtcp_hb_watchdog

Controls the behavior of the CXFS kernel heartbeat monitor watchdog. This facility monitors the generation of heartbeats in the kernel.

Range of values:

  • 0 species that there is no use of watchdog (default)

  • 1 specifies that watchdog expiration causes CXFS shutdown

  • 2 specifies that watchdog expiration causes panic

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

mtcp_nodelay

Specifies whether to enable or disable TCP_NODELAY on CXFS message channels.

Range of values:

  • 0 disables

  • 1 enables (default)

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

mtcp_rpc_thread

Specifies whether metadata messages are sent from a separate thread in order to save stack space.

Range of values:

  • 0 disables (default for most nodes)

  • 1 enables

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

rhelpd_max

Specifies the maximum number of rhelpd threads to run. The rhelpd threads help out recovery and relocation tasks. They are used for asynchronous inode reconstruction, parallel recoveries, and so on. The rhelpd thread pool is global in nature and gets created during module load time.

Range of values:

  • Default: 0, which specifies an automatically calculated value that will be 4 times the number of CPUS, as long as it is in the range 0 through 128. To disable automatic rhelpd_max calculation, set rhelpd_max to a non-zero value.

  • Minimum: 0

  • Maximum: 128

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

rhelpd_min

Specifies the minimum number of rhelpd threads to run.

Range of values:

  • Default: 0, which specifies an automatically calculated value that is 4 times the number of CPUs or 128, whichever is smaller. To disable automatic rhelpd_min calculation, set rhelpd_min to a non-zero value. When the value is set explicitly, the maximum is 8.

  • Minimum: 0

  • Maximum: 8

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

Dynamic Parameters that are Site-Configurable

Dynamic parameters take affect as soon as they are changed.

cms_local_fail_action

Specifies the action to take when a local node detects that it has failed:

Range of values:

  • 0 withdraws from the cluster (default)

  • 1 halts

  • 2 reboots

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

cxfs_client_push_period

Specifies (in hundredths of a second) how long that a client may delay telling the metadata server that it has updated the atime timestamp of a file. The default for both cxfs_client_push_period and cxfs_server_push_period is 1/4 of a second, so atime updates are delayed by up to 1/2 second by default. See also “cxfs_server_push_period”.

Range of values:

  • Default: 25

  • Minimum: 0

  • Maximum: 1000

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

cxfs_dcvn_timeout

Specifies the time-out (in seconds) of the dcvn idle period before returning tokens to the server.

Range of values:

  • Default: 60

  • Minimum: 5

  • Maximum: 3600

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

cxfs_extents_delta

Specifies whether or not to optimize the way extent lists are sent across the private network by sending a delta when possible.

Range of values:

  • 0 does not optimize

  • 1 optimizes (default)

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

cxfs_punch_hole_restrict

Specifies whether or not to allow exported files to have their extents freed by DMAPI via dm_punch_hole().

Range of values:

  • 0 allows extents to be freed (default)

  • 1 does not allow extents to be freed

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

cxfs_relocation_ok

Specifies whether relocation is disabled or enabled (must be specified on the active metadata server):

Range of values:

  • 0 disables relocation (default)

  • 1 enables relocation


Note: Relocation is disabled by default and is only supported on standby nodes. See:


Location:

  • Linux: fs.cxfs (sgi-cxfs module)

cxfs_server_push_period

Specifies (in hundredths of a second) how long that a metadata server may delay broadcasting to the clients that it has updated the atime timestamp. The default for both cxfs_client_push_period and cxfs_server_push_period is 1/4 of a second, so atime updates are delayed by up to 1/2 second by default. See also “cxfs_client_push_period”.

Range of values:

  • Default: 25

  • Minimum: 0

  • Maximum: 1000

Location:

  • Linux: fs.cxfs (sgi-cxfs module)

cxfsd_max

Specifies the maximum number of cxfsd threads to run per CXFS filesystem. (The cxfsd threads do the disk block allocation for delayed allocation buffers in CXFS and the flushing of buffered data for files that are being removed from the local cache by the metadata server.) The threads are allocated at filesystem mount time. The value of the cxfsd_max parameter at mount time remains in effect for a filesystem until it is unmounted.

Range of values:

  • Default: 0, which specifies the value of cxfsd_min + 2. (The value for cxfsd_max is always at least cxfsd_min + 2, even if that forces the kernel to increase the value beyond 2048.) To disable automatic cxfsd_max calculation, set cxfsd_max to a non-zero value.

  • Minimum: 16

  • Maximum: 2048


Note: The value for cxfsd_max cannot be less than the value specified for cxfsd_min.

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

cxfsd_min

Specifies the minimum number of cxfsd threads to run per CXFS filesystem. The value of the cxfsd_min parameter at mount time remains in effect for a filesystem until it is unmounted.

Range of values:

  • Default: 0, which specifies an automatically calculated value that will be 2 times the number of CPUS (the number of actual running cxfsd threads is dynamic), as long as it is in the range 16 through 2048. To disable automatic cxfsd_min calculation, set cxfsd_min to a non-zero value.

  • Minimum: 16

  • Maximum: 2048

Location:

  • IRIX: /var/sysgen/mtune/cxfs

  • Linux: fs.cxfs (sgi-cxfs module)

mtcp_mesg_validate

Enables checksumming. Normally, this is not needed and is only used if TCP data corruption is suspected.

Range of values:

  • 0 performs no validation (default)

  • 1 generates checksums, but does not perform validation

  • 2 generates and validates checksums, warns (via a SYSLOG message) on validation failure

  • 3 generates and validates checksums, warns and returns an error message on validation failure

  • 4 generates and validates checksums, warns and panics on validation error

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

Dynamic Parameters for Debugging Purposes Only


Caution: These parameters are provided for debugging purposes. You should only reset these parameters if advised to do so by SGI support.


cxfs_client_range_age_max

Specifies the maximum age of a granted range, measured in generations, before a client will voluntarily return it.

Range of values:

  • Default: 10

  • Minimum: 0

  • Maximum: 1000

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

See also “cxfs_server_range_age_max”.

cxfs_recovery_slowdown

Slows down recovery by inserting delays (measured in ms).

Range of values:

  • Default: 0

  • Minimum: 0

  • Maximum: 60000

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

cxfs_recovery_timeout_panic

Specifies the action taken when a node with stalled recovery is discovered.

Legal values:

  • 0 shuts down a node with stalled recovery (default)

  • 1 panics a node with stalled recovery

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

cxfs_recovery_timeout_period

Specifies the time in seconds between recovery time-out polls.

Range of values:

  • Default: 60

  • Minimum: 0 (disables recovery polls)

  • Maximum: 3600

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

cxfs_recovery_timeout_stalled

Specifies the time in seconds after which a node whose status is not changing is considered to have a stalled recovery.

Range of values:

  • Default: 600

  • Minimum: 0 (disables time-out)

  • Maximum: 3600

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

cxfs_recovery_timeout_start

Specifies the time in seconds following a recovery before the recovery time-out monitoring begins.

Range of values:

  • Default: 60

  • Minimum: 0

  • Maximum: 3600

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

cxfs_server_range_age_max

Specifies the maximum age of a granted range, measured in generations, before the server will recall it.

Range of values:

  • Default: 10

  • Minimum: 0

  • Maximum: 1000

Location:

  • IRIX: /var/sysgen/mtune/cell

  • Linux: kernel.cell (sgi-cell module)

See also “cxfs_client_range_age_max”.