Chapter 15. PCI Device Drivers

The Peripheral Component Interconnect (PCI) bus, initially designed at Intel Corp, is standardized by the PCI Bus Interest Group, a nonprofit consortium of vendors (see “Standards Documents” and “Internet Resources”).

The PCI bus is designed to be a high-performance local bus to connect peripherals to memory and a processor. In many personal computers based on Intel and Motorola processors, the primary system bus is a PCI bus. A wide range of vendors make devices that plug into the PCI bus.

The PCI bus is supported by the O2 workstation and related workstation types. This chapter contains the following topics related to support for the PCI bus:

A PCI driver is a kernel-level device driver like other drivers. For information on the architecture of a kernel-level device driver and on how to build and debug one, see Part III, “Kernel-Level Drivers.”

PCI Bus in Silicon Graphics Workstations

This section contains an overview of the main features of PCI hardware attachment, for use as background material for software designers. Hardware designers can obtain a detailed technical paper on PCI hardware through the Silicon Graphics Developer Program. Design issues such as power supply capacities, card dimensions, signal latencies, and arbitration, are covered in that material.

PCI Bus and System Bus

In no Silicon Graphics system is the PCI bus the primary system bus. The primary system bus is always a proprietary bus that connects one or more CPUs with high-performance graphics adapters and main memory. The PCI bus adapter is connected (or “bridged,” in PCI terminology) to the system bus, as shown in Figure 15-1.

Figure 15-1. PCI Bus In Relation to System Bus


The PCI adapter is a custom circuit with these main functions:

  • To act as a PCI bus target when a PCI bus master requests a read or write to memory

  • To act as a PCI bus master when a CPU requests a PIO operation

  • To manage PCI bus arbitration, allocating bus use to devices as they request it

  • To interface PCI interrupt signals to the system bus and the CPU

Different SGI systems have different PCI adapter ASICs. Although all adapters conform to the PCI standard level 2.1, there are significant differences between them in capacities, in optional features such as support for the 64-bit extension, and in performance details such as memory-access latencies.

Buses, Slots, Cards, and Devices

A system may contain one or more PCI bus adapters. Each bus connects one or more physical packages. The PCI standard allows up to 32 physical packages on a bus. A “package” may consist of a card plugged into a slot on the bus. However, a “package” can also consist of an internal chipset mounted directly on the system board, using the PCI bus and occupying one or more virtual slots on the bus. For example, the SCSI adapter in the O2 workstation occupies the first two virtual slots of the PCI bus in that system.

Each physical package can implement from one to eight functions. A PCI function is an independent device with its own configuration registers in PCI configuration space, and its own address decoders.

In Silicon Graphics systems, each PCI function is integrated into IRIX as a device. A PCI device driver manages one or more devices in this sense. A driver does not manage a particular package, or card, or bus slot; it manages one or more logical devices.

PCI Implementation in O2 Workstations

In the O2 and related workstation types, a proprietary system bus connects the CPU, multimedia devices (audio, video, and graphics) and main memory.

The PCI bus adapter interfaces one PCI bus to this system bus. The PCI bus adapter is a unit on the system bus, on a par with the other devices. The PCI bus adapter competes with the CPU and with multimedia I/O for the use of main memory.

The built-in SCSI adapter, which is located on the main system board, is logically connected to the PCI bus and takes the place of the first two “slots” on the PCI bus, so that the first actual slot is number 2.

Unsupported PCI Signals

In the O2, the PCI adapter implements a standard, 32-bit PCI bus operating at 33 MHZ. The following optional signal lines are not supported.

  • The LOCK# signal is ignored; atomic access to memory is not supported.

  • The cache-snoop signals SBO# and SDONE are ignored.

  • The JTAG signals are not supported.

64-bit Address and Data Support

The O2 PCI adapter supports 64-bit data transfers, but not 64-bit addressing. All bus addresses are 32 bits, that is, all PCI bus virtual addresses are in the 4 GB range. The Dual Address Cycle (DAC) command is not supported (or needed).

The 64-bit extension signals AD[63:32], C/BE#[7:4], REQ64# and ACK64# are pulled up as required by the PCI standard.

When the PCI bus adapter operates as a bus master (as it does when implementing a PIO load or store for the CPU), the PCI adapter generates 32-bit data cycles.

When the PCI bus adapter operates as a bus target (as it does when a PCI bus master transfers data using DMA), the PCI adapter does not respond to REQ64#, and hence 64-bit data transfers are accomplished in two, 32-bit, data phases as described in the PCI specification.

Configuration Register Initialization

When the IRIX kernel probes the PCI bus and finds an active device, it initializes the device configuration registers as follows:

Command Register

The enabling bits for I/O Access, Memory Access, and Master are set to 1. Other bits, such as Memory Write and Invalidate and Fast Back-to-Back are left at 0.

Cache Line Size

0x20 (32, 32-bit words, or 128 bytes).

Latency Timer

0x30 (48 clocks).

Base Address registers

Each register that requests memory or I/O address space is programmed with a starting address. In the O2 system, memory addresses are always greater than 0x8000 0000.

The device driver is free to set any other configuration parameters in the pfxattach() entry point (see “Attaching a Device”).

Address Spaces Supported

The relationship between the PCI bus address space and the system memory physical address space differs from one system type to another.

The O2 workstation and related systems support a 1 GB physical memory address space (30 bits of physical address used). Any part of physical address space can be mapped into PCI bus address space for purposes of DMA access from a PCI bus master device. The device driver ensures correct mapping through the use of a DMA map object (see “Allocating DMA Maps”).

Physical memory can be mapped to the PCI bus in normal order or byte-swapped order. Byte-swapping is done on the basis of 64-bit units. When a PCI bus master uses a byte-swapped DMA address as its target, and writes the 64-bit data item 0x0807 0605 0403 0201, the data 0x0102 0304 0506 0708 is delivered to memory.

PIO Address Mapping

For PIO purposes (the CPU loading and storing in device space), memory space defined by each PCI device in its configuration registers is allocated in the upper two gigabytes of the PCI address space, above 0x8000 0000. These addresses are allocated dynamically, based on the contents of the configuration registers of active devices. The I/O address space requested by each PCI device in its configuration registers is also allocated dynamically as the system comes up.

It is possible for a PCI device to request (in the initial state of its Base Address Registers) that its address space be allocated in the first 1 MB of the PCI bus. This request cannot be honored in the O2 workstation. Devices that cannot decode bus addresses above 0x8000 0000 are not supported.

Device drivers get a virtual address to use in PIO access by creating a PIO map (see “Managing PIO Maps for PCI”).

Slot Priority and Bus Arbitration

Two devices that are built in to the workstation take the positions of PCI bus slots 0 and 1. Actual bus slots begin with slot 2 and go up to the maximum for the system (just the one slot in O2).

The PCI adapter maintains two priority groups. The lower-priority group is arbitrated in round-robin style. The higher-priority group uses fixed priorities based on slot number, with the higher-numbered slot having the higher fixed priority.

The IRIX kernel assigns slots to priority groups dynamically by storing values in an adapter register. There is no kernel interface for changing this priority assignment. The audio and the available PCI slots are in the higher priority group.

Interrupt Signal Distribution

The PCI adapter can present eight unique interrupt signals to the system CPU. The IRIX kernel uses these interrupt signals to distinguish between the sources of PCI bus interrupts. The system interrupt numbers 0 through 7 are distributed across the PCI bus slots as shown in Table 15-1 (“n.c.” means no connection).

Table 15-1. PCI Interrupt Distribution to System Interrupt Numbers

PCI Interrupt

Slot 0 (built-in device)

Slot 1(built-in device)

Slot 2

Slot 3
(When Present)

Slot 4
(When Present)

INTA#

system 0

n.c.

system 2

system 3

system 4

INTB#

n.c.

system 1

system 5

system 7

system 6

INTC#

n.c.

n.c.

system 6

system 5

system 7

INTD#

n.c.

n.c.

system 7

system 6

system 5

Each physical PCI slot has a unique system interrupt number for its INTA# signal. The INTB#, INTC#, and INTD# signals are connected in a spiral pattern to three system interrupt numbers.

Driver/Kernel Interface for PCI Access

A PCI device driver manages the operation of one or more devices. In this section, “device” has two meanings.

  • A device can be a function associated with one set of configuration registers on a PCI card. A PCI card can contain up to eight such functions, but each configuration space is treated as a separate device by IRIX..

  • A logical device is a device special file in the /dev filesystem that refers to the original PCI device. For example, a PCI card that contains a serial port might be associated with /dev/ttyd12, /dev/ttyf12, and /dev/ttym12.

Besides the usual driver entry points for a block or character driver, a PCI driver has to supply the pfxattach() entry point. This entry point is called to initialize the PCI device and, optionally, to identify any additional logical devices for that PCI device. Entry points named pfxdetach() and pfxerror() are optional.

Besides the usual DDI/DKI functions, the PCI driver calls on kernel functions unique to the PCI bus. These functions are introduced in the following topics. For a summary, see “PCI Function Summary”.

Overview of PCI Driver Structure

A PCI driver is a kernel-level device driver that has the general structure described in Chapter 8, “Structure of a Kernel-Level Driver.” It uses the driver/kernel interface described in Chapter 9, “Device Driver/Kernel Interface.” A PCI driver can be loadable or it can be linked with the kernel. In general it is configured into IRIX as described in Chapter 10, “Building and Installing a Driver.”

PCI hardware configuration is more dynamic than the configuration of the VME, EISA or SCSI buses. With other types of bus, the driver learns the hardware configuration when the driver is loaded, and the configuration remains static afterward. IRIX support for the PCI bus is designed to allow support for dynamic reconfiguration in future systems. In principle, a PCI driver has to be designed to allow devices to be attached and detached at any time (no detaching is done in the current release).

The general sequence of operations of a PCI driver is as follows:

  1. In the pfxinit() entry point, the driver calls a kernel function to register itself as a PCI driver, specifying the kind of device it supports.

  2. When the kernel discovers a device of this type, it calls the pfxattach() entry point of the driver. The driver creates PIO maps and (optionally) DMA maps to use in addressing the device; initializes the device; and if necessary, registers an interrupt handler.

  3. In the normal upper-half entry points such as pfxopen(), pfxread(), and pfxstrategy(), the driver operates the device and transfers data.

  4. If the kernel learns of a bus address error on the PCI bus, it can call each driver's pfxerror() entry point to find out which device caused the error. (This feature is not implemented in the current release.)

  5. Conceptually, if the kernel learns that the device is being detached, the kernel calls the driver's pfxdetach() entry point. The driver notes the device is unusable and stops servicing it through upper-half entry points. (This feature is not implemented in the current release.)

Driver Flag Bits

As described under “Driver Flag Constant”, the pfxdevflag public name is a byte containing flags for driver characteristics. Since a PCI driver is inevitably a new driver, with no heritage in older versions of IRIX or UNIX, Silicon Graphics, Inc. strongly recommends that you design it from the start to be compatible with multiprocessors. The implications of this are discussed under “Planning for Multiprocessor Use”.

Initializing and Registering the Driver

A PCI driver must register with the kernel in order to receive notification that devices exist. In the current release this is done in two stages. First the driver calls pciio_add_attach() to introduce itself to the kernel.

extern int pciio_add_attach(
    __int32_t (*attach)(dev_t), /* pfx_attach */
    __int32_t (*detach)(dev_t), /* pfx_detach */
    __int32_t (*error)(dev_t, __int32_t ), /* pfx_error */
    char *driver_name,          /* prefix string "pfx" */
    int major);                 /* driver major number */

The argments passed are as follows:

attach

Address of the driver's pfx attach() entry point. This is required.

detach

NULL, or address of the driver's pfx detach() entry point if implemented.

error

NULL, or address of the driver's pfx error() entry point if implemented.

driver_name

Address of the driver's prefix as a character string.

major

The major number supported by this driver. The number given in the third field of the descriptive line (see “Descriptive Line”).

This call associates the driver's pfxattach(), pfxdetach(), and pfxerror() entry points with the driver's prefix string. The pfxdetach() and pfxerror() addresses may be passed as NULL when these entry points are not implemented (and there is no need to implement them in the current release).

The fourth argument (prefix string) and last argument (list of major numbers) allow the PCI support to relate this driver to device special files.

This function call is required only in the current release of IRIX. In order to make it conditional in future releases, enclose it in a test of the compiler variable _EARLY_PCI, as shown in Example 15-1.


Tip: You can create a static list of major numbers as a global variable in the driver descriptive file. See “Variables Section” for an example of just such an array.

The next step—and the only step that will be required for future releases—is to call the pciio_driver_register() function.

extern int pciio_driver_register(
   pciio_vendor_id_t vendor_id,   /* card's vendor number */
   pciio_device_id_t device_id,   /* card's device number */
   char *driver_prefix,           /* driver prefix */
   unsigned flags);

This call specifies the PCI vendor ID and device ID numbers as they appear in the PCI configuration space of any device that this driver supports. The third argument is a character string containing the driver's prefix string (as passed to pciio_add_attach()). The kernel uses this string to search the switch tables to find the addresses of the driver's pfxattach() and pfxdetach() entry points.

Example 15-1 shows a hypothetical example of driver registration.

Example 15-1. Driver Registration


__int32_t hypo_attach(dev_t); /* forward declaration */
int hypo_init()
{
   int allMyMajors[2];
   ...
   allMyMajors[0] = myMajNum; /* see Example 10-1 */
   allMyMajors[1] = 0;
#ifdef _EARLY_PCI /* need to call pciio_add_attach */
ret = pciio_add_attach(hypo_attach, /* attach entry point */
                        NULL,NULL, /* detach, error not done */
                        "hypo_",   /* prefix string */
                        allMyMajors); /* list of one major# */
   if (!ret) ...
#endif
   ret = pciio_driver_register(HYPO_VENDOR,HYPO_DEVID,"hypo_",0);
   if (!ret) ...
}

A device driver can register multiple times to handle multiple combinations of vendor ID and device ID.


Tip: You should defer the call to pciio_driver_register() to the end of the pfxinit() routine, when all global data has been initialized. The reason is that, if there is an available device of the specified type, pfxattach() might be called immediately, before the pci_driver_register() function returns. In a multiprocessor, pfxattach() can be called concurrently with the return of pci_driver_register() and following code.

A loadable driver, when called at its pfxunload() entry point, can unregister before unloading; but that is not required. (See “Unloading”).

Attaching a Device

The IRIX support for PCI is designed to allow for future support for hot-swapping and for multinode systems in which devices, slots, buses, and whole nodes come online and offline dynamically. In principle, a PCI device can be attached or detached at any time, meaning that the pfxattach() and pfxdetach() entry points could be called at any time.

In practice, the only PCI-using systems supported by IRIX 6.3 for O2 are workstations that do not permit hot-swapping. Also, the administrator commands that would force a device to detach are not defined as yet. In current systems, pfxattach() is only called at boot time; and pfxdetach() and pfxerror() are not called. Nevertheless the driver/kernel interface is designed for future flexibility. For future portability you can design your driver as if that flexibility existed now.

Matching A Device to A Driver

When the system boots up, the kernel probes the PCI bus configuration space and takes a census of active devices. For each device it notes

  • Vendor and device ID numbers

  • Requested size of memory space

  • Requested size of I/O space

The kernel assigns starting bus addresses for memory and I/O space and sets these addresses in the Base Address Registers (BARs) in the device. Then the kernel looks for a driver that has registered a matching set of vendor and device IDs using pciio_driver_register().

If no matching driver has registered, the device remains inactive. For example, the driver might be a loadable driver that has not been loaded as yet. When the driver is loaded and registers, the kernel will match it to any unattached devices.

When the kernel matches a device to its registered driver, the kernel calls the driver's pfxattach() entry point. It passes one argument, a vertex_hdl_t, which acts as an opaque handle to a kernel object that describes this device. This handle is used to:

  • Store and retrieve the driver's private information about the device

  • Request PIO and DMA maps on the device

  • Register an interrupt handler for the device

Allocating Storage for Device Information

A driver needs to save information about each device, usually in a structure.Fields in a typical structure might include:

  • Locks or semaphores used for mutual exclusion among upper-half entry points and between them and the interrupt handler.

  • Addresses of allocated PIO and DMA maps for this device (see “Allocating PIO Maps”).

  • Address of an interrupt connection object for the device (see “Registering an Interrupt Handler”).

  • In a block driver, anchors for a queue of buf_t objects being filled or emptied.

  • Device status flags.

A problem is that at initialization time a driver does not know how many devices it will be asked to manage. For a workstation such as O2 you can expect the number will be small, but you should allow for portability to server-class systems that support dozens of devices. In the past this problem has been handled by allocating an array of a fixed number of information structures, indexed by the device minor number.

This is not a good solution for a PCI driver because PCI configuration is so dynamic. In addition, a loadable driver loses the contents of its global variables when it unloads. The IRIX PCI support gives you a different way.

In a PCI driver, you dynamically allocate memory for an information structure to hold information about the one device being attached. (See “General-Purpose Allocation”.) You save the address of the structure in the kernel's hardware vertex, using the device_info_set() function, which associates an arbitrary pointer with a vertex_hdl_t.

extern void device_info_set(vertex_hdl_t, arbitrary_info_t);

The information structure can easily be recovered in any top-half routine; see “Locating Device Information”.

Allocating PIO Maps

For almost any device you need at least one PIO map. You use a PIO map to read the configuration space or the I/O space of a device. For a bus-master device you will need at least one DMA map. These maps can be allocated when the device is attached, and the addresses of the maps can be stored in the device information structure.

A PIO map is created by pciio_piomap_alloc(). It takes a vertex_hdl_t, a flag for the bus address space, and an offset in that space as its principal arguments.

extern pciio_piomap_t pciio_piomap_alloc (
      vertex_hdl_t vhdl,       /* set up mapping for this device */
      device_desc_t dev_desc,  /* device descriptor (null) */
      pciio_space_t space,     /* which address space */
      iopaddr_t pcipio_addr,   /* starting offset in space */
      size_t byte_count,
      size_t byte_count_max,   /* maximum size of a mapping */
      unsigned flags);         /* defined in sys/pio.h */

The arguments are as follows:

vhdl

The vertex_hdl_t received by the pfx attach() routine. Every map must be associated to a specific device at its hardware graph vertex.

dev_desc

Device descriptor structure with one field set (see text following).

space

Constant specifying the space to map: PCIIO_PIOMAP_CFG (configuration space), or PCIIO_PIOMAP_WIN(n).

pcipio_addr

Offset within the selected space (typically 0).

byte_count

Span of the total area over which this map might be applied.

byte_count_max

Maximum size of the area that will be mapped at any one time. When the map is always used for the same area, byte_count and byte_count_ max are the same. When the map can be used for smaller segments within a larger area, byte_count_max is the limit of one segment and byte_count the size of the total extent.

flags

Endian treatment: PCIIO_PIOMAP_BIGEND (device uses IRIX format) or PCIIO_PIOMAP_LITTLEEND (device uses Intel format).

The device descriptor structure type dev_desc_t is declared in pciio.h. Only one field is inspected, intr_swlevel. It must be set to one of the interrupt levels of type pl_t as declared in ddi.h, typically plhi.

A PIO map that you will use to access device configuration registers is based on a a space of PCIIO_PIOMAP_CFG. The space selection PCIIO_PIOMAP_WIN(n) means that this map is to be based on Base Address register n, from 0 through 5, in the PCI configuration space. The device configuration registers specify whether a given base address register defines memory or I/O space. When the space is defined by a 64-bit base address register, use the lower number, the index of the word that contains the configuration bits.

Example 15-2 shows a function that allocates a PIO map. The configuration space is passed as an argument, as is the size of the space to map. The function assumes the map should start at offset 0 in the selected space.

Example 15-2. Allocation of PCI PIO Map


pciio_piomap_t makeMap(vertex_hdl_t dev, int BAR, size_t size)
{
   struct device_desc_s ddesc;
   ddesc.intr_swlevel = plhi;
   return pciio_piomap_alloc(
         dev,         /* vertex handle */
         &ddesc,      /* dev descriptor w/ in level in it */
         BAR,         /* space, _CFG or _WIN(n) */
         0,           /* starting offset */
         size,size,   /* size to map */
         0);          /* default endian */
}

Allocating DMA Maps

A DMA map is created by pciio_dmamap_alloc(), which takes a vertex_hdl_t, a size, and flags regarding the treatment of the mapping.

extern pciio_dmamap_t pciio_dmamap_alloc(
      vertex_hdl_t dev,         /* set up mappings for this device */
      device_desc_t dev_desc,  /* device descriptor */
      size_t byte_count_max,   /* max size of a mapping */
      unsigned flags);         /* defined in dma.h */

Other map functions are discussed under “Managing PIO Maps for PCI” and “Managing DMA Maps for PCI”.

Reading the Device Configuration

Typically a PCI driver needs to read the device configuration registers and possibly write to them. These are PIO operations. To access the configuration, create a PIO map for the configuration space and extract an address from it. Present this address to pciio_config_get() to fetch a word from configuration space.

Example 15-3. Reading PCI Configuration Space


pciio_piomap_t cfg_map;
typedef volatile __uint32_t cfg_reg;
cfg_reg *cfg_ptr;
__uint32_t cfg_value;
cfg_map = pciio_piomap_alloc(
               vhdl,  /* as received in attach() */
               NULL,  /* default device descriptor */
               PCIIO_SPACE_CFG, /* map to config space */
               (iopaddr_t)0, /* at offset 0 */
               64,64, /* size of PCI config header */
               0 ); /* no flags */
cfg_ptr = (cfg_reg *) pciio_piomap_addr(
               cfg_map, /* the map to use */
               0,    /* start at the beginning */
               64 ); /* amount I care about */
cfg_value = pciio_config_get(cfg_ptr,4); /* read the register */

For a PCI bus master device, the pfxattach() function should set the Cache Line Size register to 128 (the size of a cache line in all Silicon Graphics systems).

Registering an Interrupt Handler

For devices that can interrupt, a key step during pfxattach() is to register an interrupt handler for the device. This is done in a two-step process. First you create an interrupt connection object; then you use that object to specify the interrupt handling function.

The interrupt connection is created with pciio_intr_alloc(), which takes a vertex_hdl_t and a flag for the interrupt line that the device uses.

extern pciio_intr_t pciio_intr_alloc(
         vertex_hdl_t dev,         /* which PCI device */
         device_desc_t dev_desc,   /* device descriptor */
         pciio_intr_line_t lines,  /* which line(s) will be used */
         vertex_hdl_t owner_dev);         /* owner of this intr */

The interrupt object is used in establishing a handler, and it is needed later to stop taking interrupts. You will probably want to save its address in the device information structure for later use (see “Inactivating an Interrupt Handler”).

After creating the interrupt object, you establish a handler using pciio_intr_connect(). Its principal arguments are the interrupt object, a handler address, and a value to be passed to the handler when it is called (this will typically be the address of the device information structure you are preparing).

extern int pciio_intr_connect(
         pciio_intr_t intr_hdl,      /* pciio intr resource handle */
         intr_func_t intr_func,     /* pciio intr handler */
         intr_arg_t intr_arg,       /* arg to intr handler */
         void *thread);             /* intr thread to use */

If a device will interrupt on line C, interrupt setup could resemble Example 15-4.

Example 15-4. Setting Up a PCI Interrupt Handler


pciio_intr_t intobj;
extern void int_handler(devinfo*);
int retcode;
intobj = pciio_intr_alloc(
               vhdl, /* as received in attach() */
               0,    /* device descriptor is n.a. for pci */
               PCIIO_INTR_LINE_C, /* the line it uses */
               (vertex_hdl_t) 0);
retcode = pciio_intr_connect(
               intobj, /* the interrupt object */
               (intr_func_t) int_handler, /* the handler */
               (intr_arg_t) pDevInfo, /* dev info as input */
               (void*)0 ); /* threads are next release */
if (!retcode) cmn_err(CE_WARN,"oh fiddlesticks");


Tip: Interrupts are enabled when the pfxattach() entry point is called. If the PCI device is in a state that can produce an interrupt, the interrupt handling function can be called before pciio_intr_connect() returns. Make sure that all global data used by the interrupt handler has been initialized.



Note: PCI devices can share the four PCI interrupt lines. As a result, in some cases the kernel cannot tell which device caused an interrupt. When there is any doubt, the kernel calls all the interrupt handlers that are registered to that interrupt line. For this reason, your interrupt handler must not assume that its device did cause the interrupt. It should always test to see if an interrupt is really pending, and exit immediately when one is not.


Return Value from Attach

The return code from pfxattach() is tested by the kernel. The driver can reject an attachment. When your driver cannot allocate memory, or fails due to another problem, it should:

  • Use cmn_err() to document the problem (see “Using cmn_err”)

  • Release any objects such as PIO and DMA maps that were created.

  • Release any space allocated to the device such as a device information structure.

  • Return an informative return code which might be meaningful in future releases.

More than one driver can register to support the same vendor ID and device ID. When the first driver fails to complete the attachment, the kernel continues on to test the next, until all have refused or one accepts. The pfxdetach() entry point can only be called if the pfxattach() entry point returns success (0).

Establishing Logical Devices

Some kinds of physical devices are represented by multiple device special files in /dev. For example, each serial port appears as at least four devices /dev/tty*. A tape drive can appear under different names, and a disk device has two device special files for each disk partition, one in /dev/dsk and one in /dev/rdsk (raw, or character access). Each logical device represents a slightly different treatment of the same physical device.

The pfxattach() entry point initializes the real PCI device, but it must also create hardware vertexes to represent the logical devices that should be associated with the same real PCI device. This is done in three steps:

  • Create a new hardware vertex connected to the attached vertex, using hwgraph_device_add().

  • Associate the new vertex with the minor number of the logical device, using hwgraph_device_add_minor().

  • Associate the new vertex with the same device information structure, using device_info_set().

The hwgraph_device_add() function has the following prototype:

int hwgraph_device_add(vertex_hdl_t vhdl, /* parent vertex */
                       char *name, /* name of the device */
                       char *prefix, /* driver prefix */
                       vertex_hdl_t *new_vrtx) /* return result */

The name argument is not significant in the current release, but it will be significant and visible to users in a future release. It should be one word or numeric characters to label this vertex of the hardware graph, for example “0” (logical unit number) or “nonswap” (feature or access method).

For a simplified example, see Example 15-5. This hypothetical code, which would be part of the hypo_attach() entry point, creates two logical devices. The device minor number of the first is 0x01; the second is 0x02—a simplified version of the conventions for minor numbers of tape or serial devices, in which the minor number bits represent device options or features.

Example 15-5. Creating Logical Devices for a PCI Device


vertex_hdl_t subdev;
int ret;
my_dev_info_t *pDev; /* struct stored in PCI vertex */
...
ret = hwgraph_device_add(vhdl, /* attach() input */
                         "left", /* name of minor 01 */
                         "hypo_", /* driver prefix */
                         &subdev); /* output here */
if (ret)...
ret = hwgraph_device_add_minor(subdev,(minor_t)0x01);
if (ret)...
device_info_set(subdev,my_dev_info);
ret = hwgraph_device_add(vhdl, /* attach() input */
                         "right", /* name of minor 02 */
                         "hypo_", /* driver prefix */
                         &subdev); /* output here */
if (ret)...
ret = hwgraph_device_add_minor(subdev,(minor_t)0x02);
if (ret)...
device_info_set(subdev,my_dev_info);

Normal Operation

While handling normal operations on the device, the driver needs to locate device information from the top-half entry points, and needs to translate addresses using maps.

Locating Device Information

The driver upper-half entry points are called to implement requests from user processes or filesystems that need to open, read, write, map or control the device. These calls can occur at any time; and on a multiprocessor, they can occur multiple times concurrently, on parallel CPUs.

The user process refers to a device through a file descriptor opened to a device special file. The primary argument to any upper-half entry point is the dev_t, a value that distinguishes the device. Traditional drivers extract device numbers from the dev_t (see “The Device Number Types”).

The first thing any upper-half entry point needs to do is to locate the per-device information structure that was prepared in the pfxattach() entry point (see “Allocating Storage for Device Information”). You do this by calling device_info_get(). However, that function takes a vertex_hdl_t. You get that from dev_to_vhdl(). The code, which is repeated over and over in a PCI driver, resembles Example 15-6.

Example 15-6. Retrieving Device Information


vertex_hdl_t vhdl = dev_to_vhdl(dev);
my_dev_info_t *pDev = (my_dev_info_t)device_info_get(vhdl);
   if (!pDev) return(ENXIO);
   if (!(pDev->status & USABLE)) return(ENXIO);

The only variation is the data type of the device information structure. In the pfxopen() entry point, the dev_t is received as a reference argument, not by value.

Verifying Device Usability

In the event that device_info_get() returns NULL, this device has not been attached, or the pfxattach() entry point did not allocate an information structure; or the pfxdetach() entry point has been called. In such cases, the upper-half routine should return ENXIO (no such device). This test is shown in Example 15-6.

Example 15-6 also shows another test. In the next release of IRIX, a driver will be able to implement a pfxdisable() entry point, called to make a device temporarily unusable. Even in the current release, your driver might find reasons, such as a persistent device error, to force a device offline. A single flag bit in the device information structure represents this state. Again, a return of ENXIO is appropriate.

Managing PIO Maps for PCI

The functions that are used to manage PIO maps are summarized in Table 15-2.

Table 15-2. Functions for PIO Maps for PCI

Function

Purpose and Operation

pciio_piomap_addr(D3)

Get a kernel virtual address from a PIO map for a specific offset and length.

pciio_piomap_alloc(D3)

Create a PIO map object, specifying the bus address space, base offset, and length it needs to cover.

pciio_piomap_done(D3)

Make a PIO map inactive until it is next needed (may release hardware resources asslociated to the map).

pciio_piomap_free(D3)

Release a PIO map object.

pciio_piotrans_addr(D3)

Request immediate translation of a bus address to a kernel virtual address without use of a PIO map. Returns NULL unless this system supports fixed PIO addressing.

pciio_config_get(D3)

Fetch a 32-bit value from configuration space using an address returned by pciio_piomap_addr().

pciio_config_set(D3)

Store a 32-bit value into configuration space using an address returned by pciio_piomap_addr().

Maps are allocated with pciio_piomap_alloc(). Its use is covered under “Allocating PIO Maps”, because you typically will allocate the PIO maps you need while attaching the device.

You use a PIO map by calling pciio_piomap_addr(). This function takes a map, an offset within the PCI address space described by the map, and the number of bytes of space that the address will be used to retrieve.

extern caddr_t pciio_piomap_addr(
            pciio_piomap_t pciio_piomap,   /* mapping resources */
            iopaddr_t pciio_addr,   /* map for this pcipio address */
            size_t byte_count);    /* map this many bytes */

The pciio_addr argument is added to the base offset specified to pciio_piomap_alloc(), and that in turn is added to the base address assigned by the kernel to this device, to arrive at the bus address needed. The byte_count argument specifies how many bytes beyond the bus address you may access.

The returned value is a kernel virtual address that is mapped to the requested PCI bus address for at least that many bytes. Use the address as a pointer to a “volatile” variable. In systems in which PCI space is hard-wired to specific memory addresses, pciio_piomap_alloc() is a simple function and pciio_piomap_addr() is a trivial one, so they should be used for portability's sake even in systems where you know the mapping.

An example of using a PIO map to access configuration space is shown under “Reading the Device Configuration”.

Once you have extracted an address using pciio_piomap_addr(), the map is active. It remains active until you call either pciio_piomap_done() or pciio_piomap_free(). In some systems, it costs nothing to keep a PIO map active. In other systems, an active PIO map may tie up global hardware resources. It is is a good idea to call pciio_piomap_done() when the address is no longer needed.

Some systems also support a one-step translation function, pciio_piotrans_addr(). This function takes a combination of the arguments of pciio_piomap_alloc() and pciio_piomap_addr(), and returns a translated address. In effect, it combines creating a map, using the map, and freeing the map, into a single step. However, this function can fail in systems that do not use hard-wired bus maps. The two-step process of allocating a map and then interrogating it is more general.

Access to configuration space is done in two steps. First you use pciio_piomap_addr() or pciio_piotrans_addr() to get a virtual address; then you pass the address to pciio_config_get() or pciio_config_get(). In some systems, the configuration space can be accessed directly with fetch and store instructions. In these systems, these two functions are implemented as macros.

Using Byte-Level PCI Addresses

When you use PIO to fetch or store 32-bit values on 32-bit-aligned PCI addresses, PIO works as you would expect, and a 32-bit value is fetched or returned.

However, when you use PIO to fetch or store less than 32 bits—either a 16-bit value or an 8-bit value—you must use an address that takes account of byte-swapping. The least significant address bits are summarized in Table 15-3.

Table 15-3. Least Significant Address Bytes for Short PIO

Binary Offset Within 32-bit Memory Word

LSB for 16-Bit Access

LSB for 8-bit Access

0x00

0x02

0x03

0x01

n.a.

0x02

0x02

0x00

0x01

0x03

n.a.

0x00

You can deal with this complication in either of three ways, as follows:

  • Always fetch and store 32-bit words. Unpack smaller units in memory.

  • Declare the device data as a structure and arrange the order of short fields in the structure so that the least significant address bits work out correctly. For example if the device offers the following logical structure in its PCI memory space:

    00--> dma base addr, 4 bytes
    04--> dma counter, 4 bytes
    08--> control reg, 2 bytes
    0A--> status reg, 1 byte
    0B--> byte fifo, 1 byte
    

    Declare this as a C structure as follows:

    struct {
       unsigned        dma_addr;
       unsigned        dma_count
       unsigned char   byte_fifo;
       unsigned char   status;
       unsigned short  control
    }
    

  • Write C macros for byte-address and halfword-address. The macros would use exclusive-OR to invert two or one (respectively) of the least-significant bits.

Managing DMA Maps for PCI

The functions that are used to manage simple DMA maps are summarized in Table 15-4.

Table 15-4. Functions for Simple DMA Maps for PCI

Function

Purpose and Operation

pciio_dmamap_alloc(D3)

Create a DMA map object, specifying the maximum extent of memory the map will have to cover.

pciio_dmamap_addr(D3)

Get the bus virtual address corresponding to a memory address for a specified length.

pciio_dmamap_drain(D3)

Complete all DMA transfers and flush any write-gather or prefetch buffers in the PCI bus adapter.

pciio_dmamap_done(D3)

Make a DMA map inactive (may release hardware resources asslociated to the map).

pciio_dmamap_free(D3)

Release a DMA map object.

pciio_dmatrans_addr(D3)

Request immediate translation of the address of a contiguous memory buffer to a bus address. Returns NULL unless this system supports fixed DMA addressing

Maps are allocated with pciio_dmamap_alloc(). Its use is covered under “Allocating PIO Maps”, because you typically will allocate the maps you need while attaching the device.

You obtain a map for a single, contiguous span of virtual memory by calling pciio_dmamap_addr(). It takes principle arguments of a map, a memory address, and a length.

extern iopaddr_t pciio_dmamap_addr(
         pciio_dmamap_t dmamap,      /* use these mapping resources */
         paddr_t paddr,             /* map for this address */
         size_t byte_count);        /* map this many bytes */

The value returned is a bus address that you can program into a bus master device. When the device accesses that address, it is accessing the specified memory location.

Once you have extracted an address using pciio_dmamap_addr(), the map is active. It remains active until you call either pciio_dmamap_done() or pciio_dmamap_free(). In some systems, it costs nothing to keep a DMA map active. In other systems, an active map may tie up global hardware resources. It is is a good idea to call pciio_dmamap_done() when the I/O operation is complete.

In systems in which PCI space is hard-wired to specific memory addresses, pciio_dmamap_alloc() is a short function and pciio_dmamap_addr() is a trivial one. However, these systems also support a one-step translation function, pciio_dmatrans_addr(). This function takes a combination of the arguments of pciio_dmamap_alloc() and pciio_dmamap_addr(), and returns a translated address. In effect, it combines creating a map, using the map, and freeing the map, into a single step.

Managing Address-Length Lists

In some cases you are not sure whether a memory buffer is contiguous, or perhaps you are sure that it is not. In this case you need to create a list of memory addresses and lengths—one address and length for each segment of memory. Then you need to translate the segments into a list of bus addresses. The list of bus addresses can be programmed into the device one at a time or, if the device supports scatter/gather, you can program all of the list of addresses for transfer in sequence.

Support for these cases is proved by address-length lists, an abstract data type that is supported by a family of functions. The next major release of IRIX contains a complete family of functions for address-length lists. The current release supports a subset necessary to use with DMA maps. The functions related to address-length lists are summarized in Table 15-5.

Table 15-5. Functions for DMA Using Address-Length Lists

Function

Purpose and Operation

alenlist_destroy(D3)

Release an address-length list.

alenlist_get(D3)

Retrieve the next address and length pair from a list.

buf_to_alenlist(D3)

Create an address-length list to describe the buffer represented by a buf_t object.

kvaddr_to_alenlist(D3)

Create an address-length list to describe a buffer in kernel virtual memory.

pciio_dmamap_list(D3)

Convert an address-length list of memory addresses into an address-length list of corresponding bus addresses.

pciio_dmatrans_list(D3)

Request immediate conversion of an address-length list of memory addresses into an address-length list of corresponding bus addresses. Returns NULL unless this system supports fixed DMA mapping.

The function buf_to_alenlist() is called in a pfxstrategy() entry point. It takes a buf_t and returns an address-length list that describes each segment of memory in the buffer that the buf_t describes (see “Structure buf_t” and “Entry Point strategy()”. The function kvaddr_to_alenlist() takes the address and length of any buffer in kernel virtual memory and returns an address-length list to describe that extent of memory.

When you are ready to perform DMA to a buffer, you create an address-length list to describe the buffer, and pass that through pciio_dmamap_list(). This function returns a new address-length list in which the memory addresses have been replaced by PCI bus addresses.

You step through the contents of the converted address-length list using alenlist_get(), which returns successive pairs of values—an address and a length—from the list. You program each pair of values into the bus master device.

Detaching A Device

In the next release of IRIX, the pfxdetach() entry point is called when the kernel decides to detach a PCI device. This can be caused by a hardware failure or by administrator action. In practice, it does not happen at all in workstations supported by IRIX 6.3 for O2. You may provide the entry point, but it is not called.

Inactivating an Interrupt Handler

The functions for managing interrupt handlers are summarized in Table 15-6.

Table 15-6. Functions for Managing PCI Interrupt Handlers

Function

Purpose and Operation

pciio_intr_alloc(D3)

Create an interrupt object that enables interrupts to flow from a specified device.

pciio_intr_connect(D3)

Associate an interrupt object with an interrupt handler function.

pciio_intr_disconnect(D3)

Remove the association between an interrupt object and a handler function.

pciio_intr_free(D3)

Release an interrupt object.

The allocation of an interrupt handler is covered under “Registering an Interrupt Handler”. When detaching a device, call pciio_intr_disconnect() to break the association between the interrupt and the handler function.

Inactivating Maps and Releasing Objects

There are typically various allocated objects—PIO and DMA maps, interrupt objects—that are addressed from the device information structure stored for the device. All such objects should be released at this time.

Unloading

When a loadable PCI driver is called at its pfxunload() entry point, indicating that the kernel would like to unload it, it must take great pains not to leave any dangling pointers (see “Entry Point unload()”). A driver should not unload when it has any registered interrupt handlers.

A driver does not have to unregister itself as a PCI driver before unloading. Nor does it have to detach any devices it has attached. However, if any devices are open or memory mapped, the driver should not unload.

If the kernel discovers a device and wants this driver to attach it, the kernel will reload the driver. If the driver has already attached one or more devices, the driver's information about the state of those devices is safely stored in the hardware vertex. When a process wants to open one of the devices, the driver will be reloaded automatically, and will be able to find the device information again.

PCI Function Summary

Table 15-7 contains a summary of the PCI-related kernel functions, in alphabetical order. Click on a function name to bring up its reference page (when man pages are written!).

Table 15-7. PCI-Related Kernel Functions

Function

Purpose and Operation

Discussed

alenlist_destroy(D3)

Release an address-length list.

page 397

alenlist_get(D3)

Retrieve the next address and length pair from a list.

page 397

buf_to_alenlist(D3)

Create an address-length list to describe the buffer represented by a buf_t object.

page 397

hwgraph_device_add(D3)

Add a device vertex for a logical device.

page 392

hwgraph_device_add_minor(D3)

Associate a logical device vertex with a minor number.

page 392

kvaddr_to_alenlist(D3)

Create an address-length list to describe a buffer in kernel virtual memory.

page 397

device_info_get(D3)

Retrieve the address of device information from the hardware graph vertex.

page 393

device_info_set(D3)

Store the address of device information in the hardware graph vertex.

page 393

pciio_config_get(D3)

Fetch a defined register from configuration space using a base address returned by pciio_piomap_ addr().

page 389

pciio_config_set(D3)

Store a value into one of the defined fields of configuration space using an address returned by pciio_piomap_addr().

page 389

pciio_dmamap_addr(D3)

Get the bus virtual address corresponding to a memory address for a specified length.

page 397

pciio_dmamap_alloc(D3)

Create a DMA map object, specifying the maximum extent of memory the map will have to cover.

page 387

pciio_dmamap_done(D3)

Make a DMA map inactive (may release hardware resources asslociated to the map).

page 397

pciio_dmamap_drain(D3)

Complete all DMA transfers and flush any write-gather or prefetch buffers in the PCI bus adapter.

page 397

pciio_dmamap_free(D3)

Release a DMA map object.

page 397

pciio_dmamap_list(D3)

Convert an address-length list of memory addresses into an address-length list of corresponding bus addresses.

page 397

pciio_dmatrans_addr(D3)

Request immediate translation of the address of a contiguous memory buffer to a bus address. Returns NULL unless this system supports fixed DMA addressing

page 397

pciio_dmatrans_list(D3)

Request immediate conversion of an address-length list of memory addresses into an address-length list of corresponding bus addresses. Returns NULL unless this system supports fixed DMA mapping.

page 397

pciio_driver_register(D3)

Notify the kernel that this driver is ready, and tell the vendor and device numbers it supports.

page 383

pciio_driver_unregister(D3)

Notify the kernel this driver is not available (for example the driver is unloading).

 

pciio_intr_alloc(D3)

Create an interrupt object that enables interrupts to flow from a specified device.

page 390

pciio_intr_connect(D3)

Associate an interrupt object with an interrupt handler function.

page 390

pciio_intr_disconnect(D3)

Remove the association between an interrupt object and a handler function.

 

pciio_intr_free(D3)

Release an interrupt object.

 

pciio_piomap_addr(D3)

Get a kernel virtual address from a PIO map for a specific offset and length.

page 394

pciio_piomap_alloc(D3)

Create a PIO map object, specifying the bus address space, base offset, and length it needs to cover.

page 387

pciio_piomap_done(D3)

Make a PIO map inactive until it is next needed (may release hardware resources asslociated to the map).

page 394

pciio_piomap_free(D3)

Release a PIO map object.

page 394

pciio_piotrans_addr(D3)

Request immediate translation of a bus address to a kernel virtual address without use of a PIO map. Returns NULL unless this system supports fixed PIO addressing.

page 394