Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

15.4. Direct Memory Access

Direct memory access, or DMA

, is the advanced topic that completes our
overview of memory issues. DMA is the
hardware
mechanism that allows peripheral components to transfer their I/O
data directly to and from main memory without the need to involve the
system processor. Use of this mechanism can greatly increase
throughput to and from a device, because a great deal of
computational overhead is eliminated.

15.4.1. Overview of a DMA Data Transfer

Before introducing the programming details, let's
review how a DMA transfer takes place, considering only input
transfers to simplify the discussion.

Data transfer can be triggered in two ways: either the software asks
for data (via a function such as read) or the
hardware asynchronously pushes data to the system.

In the first case, the steps involved can be summarized as follows:

When a process calls read, the driver method
allocates a DMA buffer and instructs the hardware to transfer its
data into that buffer. The process is put to sleep.
The hardware writes data to the DMA buffer and raises an interrupt
when it's done.
The interrupt handler gets the input data, acknowledges the
interrupt, and awakens the process, which is now able to read data.

The second
case comes about when DMA is used asynchronously. This happens, for
example, with data acquisition devices that go on pushing data even
if nobody is reading them. In this case, the driver should maintain a
buffer so that a subsequent read call will
return all the accumulated data to user space. The steps involved in
this kind of transfer are slightly different:

The hardware raises an interrupt to announce that new data has
arrived.
The interrupt handler allocates a buffer and tells the hardware where
to transfer its data.
The peripheral device writes the data to the buffer and raises
another interrupt when it's done.
The handler dispatches the new data, wakes any relevant process, and
takes care of housekeeping.

A variant of the asynchronous approach
is often seen with network cards. These cards often expect to see a
circular buffer (often called a
DMA ring buffer) established in memory shared
with the processor; each incoming packet is placed in the next
available buffer in the ring, and an interrupt is signaled. The
driver then passes the network packets to the rest of the kernel and
places a new DMA buffer in the ring.

The processing steps in all of these cases emphasize that efficient
DMA handling relies on interrupt reporting. While it is possible to
implement DMA with a polling driver, it wouldn't
make sense, because a polling driver would waste the performance
benefits that DMA offers over the easier processor-driven
I/O.^[4]

^[4] There are, of course, exceptions to everything;
see Section 15.2.6
for a demonstration of how
high-performance network drivers are best implemented using
polling.

Another relevant item introduced here is the DMA buffer. DMA requires
device drivers to allocate one or more special buffers suited to DMA.
Note that many drivers allocate their buffers at initialization time
and use them until shutdownthe word
allocate in the previous lists, therefore, means
"get hold of a previously allocated
buffer."

15.4.2. Allocating the DMA Buffer

This section covers the allocation
of DMA buffers at a
low level; we introduce a higher-level interface shortly, but it is
still a good idea to understand the material presented here.

The main issue that arrises with DMA buffers is that, when they are
bigger than one page, they must occupy contiguous pages in physical
memory because the device transfers data using the ISA or PCI system
bus, both of which carry physical addresses. It's
interesting to note that this constraint doesn't
apply to the SBus (see Section 12.5),
which uses
virtual addresses on the peripheral bus. Some architectures
can also use virtual addresses on the PCI bus,
but a portable driver cannot count on that capability.

Although DMA buffers can be allocated either at system boot or at
runtime, modules can allocate their buffers only at runtime.
Driver
writers must take care to allocate the right kind of memory when it
is used for DMA operations; not all memory zones are suitable. In
particular, high memory may not work for DMA on some systems and with
some devicesthe peripherals simply cannot work with addresses
that high.

Most devices on modern buses can handle 32-bit addresses, meaning
that normal memory allocations work just fine for them. Some PCI
devices, however, fail to implement the full PCI standard and cannot
work with 32-bit addresses. And ISA devices, of course, are limited
to 24-bit addresses only.

For devices with this kind of limitation, memory should be allocated
from the DMA zone by adding the GFP_DMA flag to
the kmalloc or
get_free_pages call. When this flag is present,
only memory that can be addressed with 24 bits is allocated.
Alternatively, you can use the generic DMA layer (which we discuss
shortly) to allocate buffers that work around your
device's limitations.

15.4.2.1 Do-it-yourself allocation

We
have seen how get_free_pages can allocate up to
a few megabytes (as order can range up to
MAX_ORDER, currently 11), but high-order requests
are prone to fail even when the requested buffer is far less than 128
KB, because system memory becomes fragmented over time.^[5]

^[5] The word fragmentation is usually
applied to disks to express the idea that files are not stored
consecutively on the magnetic medium. The same concept applies to
memory, where each virtual address space gets scattered throughout
physical RAM, and it becomes difficult to retrieve consecutive free
pages when a DMA buffer is requested.

When the kernel cannot return the requested amount of memory or when
you need more than 128 KB (a common requirement for PCI frame
grabbers, for example), an alternative to returning
-ENOMEM is to allocate memory at boot time or
reserve the top of physical RAM for your buffer. We described
allocation at boot time in Section 8.6,
but it is not available to
modules. Reserving the top of RAM is accomplished by passing a
mem= argument to the kernel at boot time. For
example, if you have 256 MB, the argument mem=255M
keeps the kernel from using the top megabyte. Your module could later
use the following code to gain access to such memory:

dmabuf = ioremap (0xFF00000 /* 255M */, 0x100000 /* 1M */);

The allocator, part of the sample code
accompanying the book, offers a simple API to probe and manage such
reserved RAM and has been used successfully on several architectures.
However, this trick doesn't work when you have an
high-memory system (i.e., one with more physical memory than could
fit in the CPU address space).

Another option, of course, is to allocate your buffer with the
GFP_NOFAIL allocation flag. This approach does,
however, severely stress the memory management subsystem, and it runs
the risk of locking up the system altogether; it is best avoided
unless there is truly no other way.

If you are going to such lengths to allocate a large DMA buffer,
however, it is worth putting some thought into alternatives. If your
device can do scatter/gather I/O, you can allocate your buffer in
smaller pieces and let the device do the rest. Scatter/gather I/O can
also be used when performing direct I/O into user space, which may
well be the best solution when a truly huge buffer is required.

15.4.3. Bus Addresses

A
device driver using DMA has to talk to hardware connected to the
interface bus, which uses physical addresses, whereas program code
uses virtual addresses.

As a matter of fact, the situation is slightly more complicated than
that. DMA-based hardware uses bus, rather than
physical, addresses. Although ISA and PCI bus
addresses are simply physical addresses on the PC, this is not true
for every platform. Sometimes the interface bus is connected through
bridge circuitry that maps I/O addresses to different physical
addresses. Some systems even have a page-mapping scheme that can make
arbitrary pages appear contiguous to the peripheral bus.

At the lowest level (again, we'll look at a
higher-level solution shortly), the Linux kernel provides a portable
solution by exporting the following functions, defined in
<asm/io.h>. The use of these functions is
strongly discouraged, because they work properly only on systems with
a very simple I/O architecture; nonetheless, you may encounter them
when working with kernel code.

unsigned long virt_to_bus(volatile void *address);
void *bus_to_virt(unsigned long address);

These functions perform a simple
conversion between kernel logical addresses and bus addresses. They
do not work in any situation where an I/O memory management unit must
be programmed or where bounce buffers must be used. The right way of
performing this conversion is with the generic DMA layer, so we now
move on to that topic.

15.4.4. The Generic DMA Layer

DMA operations, in the
end, come down to allocating a buffer and
passing bus addresses to your device. However, the task of writing
portable drivers that perform DMA safely and correctly on all
architectures is harder than one might think. Different systems have
different ideas of how cache coherency should work; if you do not
handle this issue correctly, your driver may corrupt memory. Some
systems have complicated bus hardware that can make the DMA task
easieror harder. And not all systems can perform DMA out of
all parts of memory. Fortunately, the kernel provides a bus- and
architecture-independent DMA layer that hides most of these issues
from the driver author. We strongly encourage you to use this layer
for DMA operations in any driver you write.

Many of the functions below require a pointer to a struct device. This structure is the low-level representation of a
device within the Linux device model. It is not something that
drivers often have to work with directly, but you do need it when
using the generic DMA layer. Usually, you can find this structure
buried inside the bus specific that describes your device. For
example, it can be found as the dev field in
struct pci_device or struct usb_device. The device structure is
covered in detail in Chapter 14.

Drivers that use the following functions should include
<linux/dma-mapping.h>.

15.4.4.1 Dealing with difficult hardware

The first question
that
must be answered before attempting
DMA is whether the given device is capable of such an operation on
the current host. Many devices are limited in the range of memory
they can address, for a number of reasons. By default, the kernel
assumes that your device can perform DMA to any 32-bit address. If
this is not the case, you should inform the kernel of that fact with
a call to:

    int dma_set_mask(struct device *dev, u64 mask);

The mask should show the bits that your device can
address; if it is limited to 24 bits, for example, you would pass
mask as 0x0FFFFFF. The return
value is nonzero if DMA is possible with the given
mask; if dma_set_mask returns
0, you are not able to use DMA operations with
this device. Thus, the initialization code in a driver for a device
limited to 24-bit DMA operations might look like:

if (dma_set_mask (dev, 0xffffff))
card->use_dma = 1;
else {
card->use_dma = 0;   /* We'll have to live without DMA */
printk (KERN_WARN, "mydev: DMA not supported\n");
}

Again, if your device supports normal, 32-bit DMA operations, there
is no need to call dma_set_mask.

15.4.4.2 DMA mappings

A DMA mapping is a
combination
of allocating a DMA buffer and generating an address for that buffer
that is accessible by the device. It is tempting to get that address
with a simple call to virt_to_bus, but there are
strong reasons for avoiding that approach. The first of those is that
reasonable hardware comes with an IOMMU that provides a
set of mapping
registers

for the bus. The IOMMU can arrange for any physical memory to appear
within the address range accessible by the device, and it can cause
physically scattered buffers to look contiguous to the device. Making
use of the IOMMU requires using the generic DMA layer;
virt_to_bus is not up to the task.

Note that not all architectures have an IOMMU; in particular, the
popular x86 platform has no IOMMU support. A properly written driver
need not be aware of the I/O support hardware it is running over,
however.

Setting
up a useful address for the
device may also, in some cases, require the
establishment of a bounce buffer. Bounce
buffers are created when a driver attempts to perform DMA on an
address that is not reachable by the peripheral devicea
high-memory address, for example. Data is then copied to and from the
bounce buffer as needed. Needless to say, use of bounce buffers can
slow things down, but sometimes there is no alternative.

DMA mappings must also address the issue of cache coherency. Remember
that modern processors keep copies of recently accessed memory areas
in a fast, local cache; without this cache, reasonable performance is
not possible. If your device changes an area of main memory, it is
imperative that any processor caches covering that area be
invalidated; otherwise the processor may work with an incorrect image
of main memory, and data corruption results. Similarly, when your
device uses DMA to read data from main memory, any changes to that
memory residing in processor caches must be flushed out first. These
cache
coherency

issues can create no end of obscure and difficult-to-find bugs if the
programmer is not careful. Some architectures manage cache coherency
in the hardware, but others require software support. The generic DMA
layer goes to great lengths to ensure that things work correctly on
all architectures, but, as we will see, proper behavior requires
adherence to a small set of rules.

The DMA mapping sets up a new type, dma_addr_t, to
represent bus addresses. Variables of type
dma_addr_t should be treated as opaque by the
driver; the only allowable operations are to pass them to the DMA
support routines and to the device itself. As a bus address,
dma_addr_t may lead to unexpected problems if used
directly by the CPU.

The PCI code distinguishes between two types of DMA mappings,
depending on how long the DMA buffer is expected to stay around:

Coherent DMA mappings

These mappings usually exist for the life of the driver. A coherent
buffer must be simultaneously available to both the CPU and the
peripheral (other types of mappings, as we will see later, can be
available only to one or the other at any given time). As a result,
coherent mappings must live in cache-coherent memory. Coherent
mappings can be expensive to set up and use.

Streaming DMA mappings

Streaming mappings are usually set up
for a single operation. Some architectures allow for significant
optimizations when streaming mappings are used, as we see, but these
mappings also are subject to a stricter set of rules in how they may
be accessed. The kernel developers recommend the use of streaming
mappings over coherent mappings whenever possible. There are two
reasons for this recommendation. The first is that, on systems that
support mapping registers, each DMA mapping uses one or more of them
on the bus. Coherent mappings, which have a long lifetime, can
monopolize these registers for a long time, even when they are not
being used. The other reason is that, on some hardware, streaming
mappings can be optimized in ways that are not available to coherent
mappings.

The two mapping types must be manipulated in different ways;
it's time to look at the details.

15.4.4.3 Setting up coherent DMA mappings

A driver can set up a

coherent
mapping with a call to dma_alloc_coherent:

void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, int flag);

This function handles both the allocation and the mapping of the
buffer. The first two arguments are the device structure and the size
of the buffer needed. The function returns the result of the DMA
mapping in two places. The return value from the function is a kernel
virtual address for the buffer, which may be used by the driver; the
associated bus address, meanwhile, is returned in
dma_handle. Allocation is handled in this function
so that the buffer is placed in a location that works with DMA;
usually the memory is just allocated with
get_free_pages (but note that the size is in
bytes, rather than an order value). The flag
argument is the usual GFP_ value describing how
the memory is to be allocated; it should usually be
GFP_KERNEL (usually) or
GFP_ATOMIC (when running in atomic context).

When
the buffer is no longer needed (usually at module unload time), it
should be returned to the system with
dma_free_coherent:

void dma_free_coherent(struct device *dev, size_t size,
void *vaddr, dma_addr_t dma_handle);

Note that this function, like many of the generic DMA functions,
requires that all of the size, CPU address, and bus address arguments
be provided.

15.4.4.4 DMA pools

A DMA pool is an allocation
mechanism
for small, coherent DMA mappings. Mappings obtained from
dma_alloc_coherent may have a minimum size of
one page. If your device needs smaller DMA areas than that, you
should probably be using a DMA pool. DMA pools are also useful in
situations where you may be tempted to perform DMA to small areas
embedded within a larger structure. Some very obscure driver bugs
have been traced down to cache coherency problems with structure
fields adjacent to small DMA areas. To avoid this problem, you should
always allocate areas for DMA operations explicitly, away from other,
non-DMA data structures.

The DMA pool functions are defined in
<linux/dmapool.h>.

A DMA pool must be created before use with a call to:

struct dma_pool *dma_pool_create(const char *name, struct device *dev, 
size_t size, size_t align, 
size_t allocation);

Here, name is a name for the pool,
dev is your device structure,
size is the size of the buffers to be allocated
from this pool, align is the required hardware
alignment for allocations from the pool (expressed in bytes), and
allocation is, if nonzero, a memory boundary that
allocations should not exceed. If allocation is
passed as 4096, for example, the buffers allocated from this pool do
not cross 4-KB boundaries.

When you are done with a pool, it
can be freed with:

void dma_pool_destroy(struct dma_pool *pool);

You should return all allocations to the pool before destroying it.

Allocations are handled with dma_pool_alloc:

void *dma_pool_alloc(struct dma_pool *pool, int mem_flags, 
dma_addr_t *handle);

For this call, mem_flags is the usual set of
GFP_ allocation flags. If all goes well, a region
of memory (of the size specified when the pool was created) is
allocated and returned. As with
dma_alloc_coherent, the address of the resulting
DMA buffer is returned as a kernel virtual address and stored in
handle as a bus address.

Unneeded buffers should be returned to the pool with:

void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr);

15.4.4.5 Setting up streaming DMA mappings

Streaming mappings have a
more

complicated interface than the
coherent variety, for a number of reasons. These mappings expect to
work with a buffer that has already been allocated by the driver and,
therefore, have to deal with addresses that they did not choose. On
some architectures, streaming mappings can also have multiple,
discontiguous pages and multipart
"scatter/gather" buffers. For all
of these reasons, streaming mappings have their own set of mapping
functions.

When setting up a streaming mapping, you must tell the kernel in
which direction the data is moving. Some symbols (of type
enum dma_data_direction) have
been defined for this purpose:

DMA_TO_DEVICE

DMA_FROM_DEVICE

These two symbols should be reasonably
self-explanatory. If data is being sent to the device (in response,
perhaps, to a write system call),
DMA_TO_DEVICE should be used; data going to the
CPU, instead, is marked with DMA_FROM_DEVICE.

DMA_BIDIRECTIONAL

If
data can move in either direction, use
DMA_BIDIRECTIONAL.

DMA_NONE

This
symbol is provided only as a debugging aid. Attempts to use buffers
with this "direction" cause a
kernel panic.

It may be tempting to just pick DMA_BIDIRECTIONAL
at all times, but driver authors should resist that temptation. On
some architectures, there is a performance penalty to pay for that
choice.

When you have a single buffer
to

transfer, map it with dma_map_single:

dma_addr_t dma_map_single(struct device *dev, void *buffer, size_t size, 
enum dma_data_direction direction);

The return value is the bus address that you can pass to the device
or NULL if something goes wrong.

Once the transfer is

complete,
the mapping should be deleted with
dma_unmap_single:

void dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, 
enum dma_data_direction direction);

Here, the size and direction
arguments must match those used to map the buffer.

Some important rules apply to streaming DMA mappings:

The buffer must be used only for a transfer that matches the
direction value given when it was mapped.
Once a buffer has been mapped, it belongs to the device, not the
processor. Until the buffer has been unmapped, the driver should not
touch its contents in any way. Only after
dma_unmap_single has been called is it safe for
the driver to access the contents of the buffer (with one exception
that we see shortly). Among other things, this rule implies that a
buffer being written to a device cannot be mapped until it contains
all the data to write.
The buffer must not be unmapped while DMA is still active, or serious
system instability is guaranteed.

You may be wondering why the driver can no longer work with a buffer
once it has been mapped. There are actually two reasons why this rule
makes sense. First, when a buffer is mapped for DMA, the kernel must
ensure that all of the data in that buffer has actually been written
to memory. It is likely that some data is in the
processor's cache when
dma_unmap_single is issued, and must be
explicitly flushed. Data written to the buffer by the processor after
the flush may not be visible to the device.

Second,
consider what happens if the buffer to be mapped is in a region of
memory that is not accessible to the device. Some architectures
simply fail in this case, but others create a bounce buffer. The
bounce buffer is just a separate region of memory that
is accessible to the device. If a buffer is
mapped with a direction of DMA_TO_DEVICE, and a
bounce buffer is required, the contents of the original buffer are
copied as part of the mapping operation. Clearly, changes to the
original buffer after the copy are not seen by the device. Similarly,
DMA_FROM_DEVICE bounce buffers are copied back to
the original buffer by dma_unmap_single; the
data from the device is not present until that copy has been done.

Incidentally, bounce buffers are one reason why it is important to
get the direction right. DMA_BIDIRECTIONAL bounce
buffers are copied both before and after the operation, which is
often an unnecessary waste of CPU cycles.

Occasionally a driver needs to access the contents of
a streaming DMA
buffer without unmapping it. A call has
been
provided to make this possible:

void dma_sync_single_for_cpu(struct device *dev, dma_handle_t bus_addr, 
size_t size, enum dma_data_direction direction);

This function should be called before the processor accesses a
streaming DMA buffer. Once the call has been made, the CPU
"owns" the DMA buffer and can work
with it as needed. Before the device accesses the buffer, however,
ownership should be transferred back to it with:

void dma_sync_single_for_device(struct device *dev, dma_handle_t bus_addr, 
size_t size, enum dma_data_direction direction);

The processor, once again, should not access the DMA buffer after
this call has been made.

15.4.4.6 Single-page streaming mappings

Occasionally, you may want

to set up a mapping on a buffer for
which you have a struct page pointer; this can
happen, for example, with user-space buffers mapped with
get_user_pages. To set up and tear down
streaming mappings using struct
page pointers, use the following:

dma_addr_t dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction direction);
void dma_unmap_page(struct device *dev, dma_addr_t dma_address, 
size_t size, enum dma_data_direction direction);

The offset and size arguments
can be used to map part of a page. It is recommended, however, that
partial-page mappings be avoided unless you are really sure of what
you are doing. Mapping part of a page can lead to cache coherency
problems if the allocation covers only part of a cache line; that, in
turn, can lead to memory corruption and extremely difficult-to-debug
bugs.

15.4.4.7 Scatter/gather mappings

Scatter/gather mappings are a
special type of streaming DMA mapping. Suppose you have several
buffers, all of which need to be transferred to or from the device.
This situation can come about in several ways, including from a
readv or writev system
call, a clustered disk I/O request, or a list of pages in a mapped
kernel I/O buffer. You could simply map each buffer, in turn, and
perform the required operation, but there are advantages to mapping
the whole list at once.

Many
devices can accept a scatterlist of array
pointers and lengths, and transfer them all in one DMA operation; for
example, "zero-copy" networking is
easier if packets can be built in multiple pieces. Another reason to
map scatterlists as a whole is to take advantage of systems that have
mapping registers in the bus hardware. On such systems, physically
discontiguous pages can be assembled into a single, contiguous array
from the device's point of view. This technique
works only when the entries in the scatterlist are equal to the page
size in length (except the first and last), but when it does work, it
can turn multiple operations into a single DMA, and speed things up
accordingly.

Finally, if a bounce buffer must be used, it makes sense to coalesce
the entire list into a single buffer (since it is being copied
anyway).

So now you're convinced that mapping of scatterlists
is worthwhile in some situations. The first step in mapping a
scatterlist is to create and fill in an array of
struct scatterlist describing
the buffers to be transferred. This structure is architecture
dependent, and is described in
<asm/scatterlist.h>. However, it always
contains three fields:

struct page *page;

The struct page pointer
corresponding to the buffer to be used in the scatter/gather
operation.

unsigned int length;

unsigned int offset;

The length of that buffer and its offset within the page

To
map a scatter/gather DMA operation, your driver should set the
page, offset, and
length fields in a struct
scatterlist entry for each buffer to be
transferred. Then call:

int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction direction)

where nents is the number of scatterlist entries
passed in. The return value is the number of DMA buffers to transfer;
it may be less than nents.

For each buffer in the input scatterlist,
dma_map_sg determines the proper bus address to
give to the device. As part of that task, it also coalesces buffers
that are adjacent to each other in memory. If the system your driver
is running on has an I/O memory management unit,
dma_map_sg also programs that
unit's mapping registers, with the possible result
that, from your device's point of view, you are able
to transfer a single, contiguous buffer. You will never know what the
resulting transfer will look like, however, until after the call.

Your driver should transfer each buffer returned by
pci_map_sg. The bus address and length of each
buffer are stored in the struct scatterlist
entries, but their location in the structure varies from one
architecture to the next. Two macros have been defined to make it
possible to write portable code:

dma_addr_t sg_dma_address(struct scatterlist *sg);

Returns
the bus (DMA) address from this scatterlist entry.

unsigned int sg_dma_len(struct scatterlist *sg);

Returns
the length of this buffer.

Again, remember that the address and length of the buffers to
transfer may be different from what was passed in to
dma_map_sg.

Once the transfer is complete, a scatter/gather mapping is unmapped
with a call to dma_unmap_sg:

void dma_unmap_sg(struct device *dev, struct scatterlist *list,
int nents, enum dma_data_direction direction);

Note that nents must be the number of entries that
you originally passed to dma_map_sg and not the
number of DMA buffers the function returned to you.

Scatter/gather mappings are streaming DMA mappings, and the same
access rules apply to them as to the single variety. If you must
access a mapped scatter/gather list, you must synchronize it first:

void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction direction);
void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction direction);

15.4.4.8 PCI double-address cycle mappings

Normally, the DMA support

layer works with 32-bit bus addresses,
possibly restricted by a specific device's DMA mask.
The PCI bus, however, also supports a 64-bit addressing mode, the
double-address cycle (DAC). The generic DMA
layer does not support this mode for a couple of reasons, the first
of which being that it is a PCI-specific feature. Also, many
implementations of DAC are buggy at best, and, because DAC is slower
than a regular, 32-bit DMA, there can be a performance cost. Even so,
there are applications where using DAC can be the right thing to do;
if you have a device that is likely to be working with very large
buffers placed in high memory, you may want to consider implementing
DAC support. This support is available only for the PCI bus, so
PCI-specific routines must be used.

To use DAC, your driver must include
<linux/pci.h>. You must set a separate DMA
mask:

int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask);

You can use DAC addressing only if this call returns
0.

A special type (dma64_addr_t) is used for DAC
mappings. To establish one of these mappings, call
pci_dac_page_to_dma:

dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, struct page *page, 
unsigned long offset, int direction);

DAC mappings, you will notice, can be made only from
struct page pointers (they
should live in high memory, after all, or there is no point in using
them); they must be created a single page at a time. The
direction argument is the PCI equivalent of the
enum dma_data_direction used in the generic DMA
layer; it should be PCI_DMA_TODEVICE,
PCI_DMA_FROMDEVICE, or
PCI_DMA_BIDIRECTIONAL.

DAC mappings require no external resources, so there is no need to
explicitly release them after use. It is necessary, however, to treat
DAC mappings like other streaming mappings, and observe the rules
regarding buffer ownership. There is a set of functions for
synchronizing

DMA
buffers that is analogous to the generic variety:

void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev,
dma64_addr_t dma_addr,
size_t len,
int direction);
void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev,
dma64_addr_t dma_addr,
size_t len,
int direction);

15.4.4.9 A simple PCI DMA example

As an example of how the DMA mappings
might be used, we present a simple example of DMA coding for a PCI
device. The actual form of DMA operations on the PCI bus is very
dependent on the device being driven. Thus, this example does not
apply to any real device; instead, it is part of a hypothetical
driver called dad (DMA Acquisition Device). A
driver for this device might define a transfer function like this:

int dad_transfer(struct dad_dev *dev, int write, void *buffer, 
size_t count)
{
dma_addr_t bus_addr;
/* Map the buffer for DMA */
dev->dma_dir = (write ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
dev->dma_size = count;
bus_addr = dma_map_single(&dev->pci_dev->dev, buffer, count, 
dev->dma_dir);
dev->dma_addr = bus_addr;
/* Set up the device */
writeb(dev->registers.command, DAD_CMD_DISABLEDMA);
writeb(dev->registers.command, write ? DAD_CMD_WR : DAD_CMD_RD);
writel(dev->registers.addr, cpu_to_le32(bus_addr));
writel(dev->registers.len, cpu_to_le32(count));
/* Start the operation */
writeb(dev->registers.command, DAD_CMD_ENABLEDMA);
return 0;
}

This function maps the buffer to be transferred and starts the device
operation. The other half of the job must be done in the interrupt
service routine, which looks something like this:

void dad_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
struct dad_dev *dev = (struct dad_dev *) dev_id;
/* Make sure it's really our device interrupting */
/* Unmap the DMA buffer */
dma_unmap_single(dev->pci_dev->dev, dev->dma_addr, 
dev->dma_size, dev->dma_dir);
/* Only now is it safe to access the buffer, copy to user, etc. */
...
}

Obviously, a great deal of detail has been left out of this example,
including whatever steps may be required to prevent attempts to start
multiple, simultaneous DMA operations.

15.4.5. DMA for ISA Devices

The ISA bus allows for two kinds of DMA
transfers: native DMA and ISA bus master DMA. Native DMA uses
standard DMA-controller circuitry on the motherboard to drive the
signal lines on the ISA bus. ISA bus master DMA, on the other hand,
is handled entirely by the peripheral device. The latter type of DMA
is rarely used and doesn't require discussion here,
because it is similar to DMA for PCI devices, at least from the
driver's point of view. An example of an ISA bus
master is the 1542 SCSI controller, whose driver is
drivers/scsi/aha1542.c in the kernel sources.

As far as native DMA is concerned, there are three entities involved
in a DMA data transfer on the ISA bus:

The 8237 DMA controller (DMAC)

The
controller holds information about the DMA transfer, such as the
direction, the memory address, and the size of the transfer. It also
contains a counter that tracks the status of ongoing transfers. When
the controller receives a DMA request signal, it gains control of the
bus and drives the signal lines so that the device can read or write
its data.

The peripheral device

The device must activate the DMA request signal when
it's ready to transfer data. The actual transfer is
managed by the DMAC; the hardware device sequentially reads or writes
data onto the bus when the controller strobes the device. The device
usually raises an interrupt when the transfer is over.

The device driver

The driver has little to do; it provides the DMA controller with the
direction, bus address, and size of the transfer. It also talks to
its peripheral to prepare it for transferring the data and responds
to the interrupt when the DMA is over.

The original DMA controller used in the PC could manage four
"channels," each associated with
one set of DMA registers. Four devices could store their DMA
information in the controller at the same time. Newer PCs contain the
equivalent of two DMAC devices:^[6] the
second controller (master) is connected to the system processor, and
the first (slave) is connected to channel 0 of the
second controller.^[7]

^[6] These circuits are
now part of the motherboard's chipset, but a few
years ago they were two separate 8237 chips.

^[7] The original PCs had only one
controller; the second was added in 286-based platforms. However, the
second controller is connected as the master because it handles
16-bit transfers; the first transfers only eight bits at a time and
is there for backward compatibility.

The channels are numbered from 0-7: channel 4 is not available to ISA
peripherals, because it is used internally to cascade the slave
controller onto the master. The available channels are, thus, 0-3 on
the slave (the 8-bit channels) and 5-7 on the master (the 16-bit
channels). The size of any DMA transfer, as stored in the controller,
is a 16-bit number representing the number of bus cycles. The maximum
transfer size is, therefore, 64 KB for the slave controller (because
it transfers eight bits in one cycle) and 128 KB for the master
(which does 16-bit transfers).

Because the DMA controller is a system-wide resource, the kernel
helps deal with it. It uses a DMA registry to provide a
request-and-free mechanism for the DMA channels and a set of
functions to configure channel information in the DMA controller.

15.4.5.1 Registering DMA usage

You
should be used to kernel registrieswe've
already seen them for I/O ports and interrupt lines. The DMA channel
registry is similar to the others. After
<asm/dma.h> has been included, the
following functions can be used to obtain and release ownership of a
DMA channel:

int request_dma(unsigned int channel, const char *name); 
void free_dma(unsigned int channel);

The channel argument is a number between 0 and 7
or, more precisely, a positive number less than
MAX_DMA_CHANNELS. On the PC,
MAX_DMA_CHANNELS is defined as
8 to match the hardware. The
name argument is a string identifying the device.
The specified name appears in the file
/proc/dma, which can be read by user programs.

The return value from request_dma is
0 for success and -EINVAL or
-EBUSY if there was an error. The former means
that the requested channel is out of range, and the latter means that
another device is holding the channel.

We recommend that you take the
same care with DMA channels as with I/O ports and interrupt lines;
requesting the channel at open time is much
better than requesting it from the module initialization function.
Delaying the request allows some sharing between drivers; for
example, your sound card and your analog I/O interface can share the
DMA channel as long as they are not used at the same time.

We also suggest that you request the DMA channel
after you've requested the
interrupt line and that you release it before
the interrupt. This is the conventional order for requesting the two
resources; following the convention avoids possible deadlocks. Note
that every device using DMA needs an IRQ line as well; otherwise, it
couldn't signal the completion of data transfer.

In a typical case, the code for open looks like
the following, which refers to our hypothetical
dad module. The dad device
as shown uses a fast interrupt handler without support for shared IRQ
lines.

int dad_open (struct inode *inode, struct file *filp)
{
struct dad_device *my_device; 
/* ... */
if ( (error = request_irq(my_device.irq, dad_interrupt,
SA_INTERRUPT, "dad", NULL)) )
return error; /* or implement blocking open */
if ( (error = request_dma(my_device.dma, "dad")) ) {
free_irq(my_device.irq, NULL);
return error; /* or implement blocking open */
}
/* ... */
return 0;
}

The close implementation that matches the
open just shown looks like this:

void dad_close (struct inode *inode, struct file *filp)
{
struct dad_device *my_device;
/* ... */
free_dma(my_device.dma);
free_irq(my_device.irq, NULL);
/* ... */
}

Here's how the /proc/dma file
looks on a system with the sound card installed:

merlino% cat /proc/dma
1: Sound Blaster8
4: cascade

It's interesting to note that the default sound
driver gets the DMA channel at system boot and never releases it. The
cascade entry is a placeholder, indicating that
channel 4 is not available to drivers, as explained earlier.

15.4.5.2 Talking to the DMA controller

After registration, the main part of the driver's
job consists of configuring the DMA controller for proper operation.
This task is not trivial, but fortunately, the kernel exports all the
functions needed by the typical driver.

The driver needs to configure the
DMA controller either when read or
write is called, or when preparing for
asynchronous transfers. This latter task is performed either at
open time or in response to an
ioctl command, depending on the driver and the
policy it implements. The code shown here is the code that is
typically called by the read or
write device methods.

This subsection provides a quick overview of the internals of the DMA
controller so you understand the code introduced here. If you want to
learn more, we'd urge you to read
<asm/dma.h> and some hardware manuals
describing the PC architecture. In particular, we
don't deal with the issue of 8-bit versus 16-bit
data transfers. If you are writing device drivers for ISA device
boards, you should find the relevant information in the hardware
manuals for the devices.

The
DMA controller is a shared resource, and confusion could arise if
more than one processor attempts to program it simultaneously. For
that reason, the controller is protected by a spinlock, called
dma_spin_lock. Drivers should not manipulate the
lock directly; however, two functions have been provided to do that
for you:

unsigned long claim_dma_lock( );

Acquires
the DMA spinlock. This function also blocks interrupts on the local
processor; therefore, the return value is a set of flags describing
the previous interrupt state; it must be passed to the following
function to restore the interrupt state when you are done with the
lock.

void release_dma_lock(unsigned long flags);

Returns
the DMA spinlock and restores the previous interrupt status.

The spinlock should be held when using the functions described next.
It should not be held during the actual I/O,
however. A driver should never sleep when holding a spinlock.

The information that must be loaded into the controller consists of
three items: the RAM address, the number of atomic items that must be
transferred (in bytes or words), and the direction of the transfer.
To this end, the following functions are exported by
<asm/dma.h>:

void set_dma_mode(unsigned int channel, char mode);

Indicates
whether the channel must read from the device
(DMA_MODE_READ) or write to it
(DMA_MODE_WRITE). A third mode exists,
DMA_MODE_CASCADE, which is used to release control
of the bus. Cascading is the way the first controller is connected to
the top of the second, but it can also be used by true ISA bus-master
devices. We won't discuss bus mastering here.

void set_dma_addr(unsigned int channel, unsigned int addr);

Assigns
the address of the DMA buffer. The function stores the 24 least
significant bits of addr in the controller. The
addr argument must be a bus
address (see the Section 15.4.3 earlier in this chapter).

void set_dma_count(unsigned int channel, unsigned int count);

Assigns
the number of bytes to transfer. The count
argument represents bytes for 16-bit channels as well; in this case,
the number must be even.

In addition to these functions, there are a number of housekeeping
facilities that must be used when dealing with DMA devices:

void disable_dma(unsigned int channel);

A
DMA channel can be disabled within the controller. The channel should
be disabled before the controller is configured to prevent improper
operation. (Otherwise, corruption can occur because the controller is
programmed via 8-bit data transfers and, therefore, none of the
previous functions is executed atomically).

void enable_dma(unsigned int channel);

This
function tells the controller that the DMA channel contains valid
data.

int get_dma_residue(unsigned int channel);

The
driver sometimes needs to know whether a DMA transfer has been
completed. This function returns the number of bytes that are still
to be transferred. The return value is 0 after a
successful transfer and is unpredictable (but not
0) while the controller is working. The
unpredictability springs from the need to obtain the 16-bit residue
through two 8-bit input operations.

void clear_dma_ff(unsigned int channel)

This
function clears the DMA flip-flop. The flip-flop is used to control
access to 16-bit registers. The registers are accessed by two
consecutive 8-bit operations, and the flip-flop is used to select the
least significant byte (when it is clear) or the most significant
byte (when it is set). The flip-flop automatically toggles when eight
bits have been transferred; the programmer must clear the flip-flop
(to set it to a known state) before accessing the DMA registers.

Using these functions, a driver can implement a function like the
following to prepare for a DMA transfer:

int dad_dma_prepare(int channel, int mode, unsigned int buf,
unsigned int count)
{
unsigned long flags;
flags = claim_dma_lock(  );
disable_dma(channel);
clear_dma_ff(channel);
set_dma_mode(channel, mode);
set_dma_addr(channel, virt_to_bus(buf));
set_dma_count(channel, count);
enable_dma(channel);
release_dma_lock(flags);
return 0;
}

Then, a function like the next one is used to check for
successful completion of DMA:

int dad_dma_isdone(int channel)
{
int residue;
unsigned long flags = claim_dma_lock (  );
residue = get_dma_residue(channel);
release_dma_lock(flags);
return (residue =  = 0);
}

The only thing that remains to be done is to configure the device
board. This device-specific task usually consists of reading or
writing a few I/O ports. Devices differ in significant ways. For
example, some devices expect the programmer to tell the hardware how
big the DMA buffer is, and sometimes the driver has to read a value
that is hardwired into the device. For configuring the board, the
hardware manual is your only friend.