Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Linux Device Drivers (3rd Edition) [Electronic resources] - نسخه متنی

Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








9.1. I/O Ports and I/O Memory


Every peripheral device is


controlled by writing and reading
its registers. Most of the time a device has several registers, and
they are accessed at consecutive addresses, either in the memory
address space or in the I/O address space.

At the hardware level, there is no conceptual difference between
memory regions and I/O regions: both of them are accessed by
asserting electrical signals on the address bus and control bus
(i.e., the read and write
signals)[1] and by reading from or writing to the data bus.

[1] Not all computer platforms use a
read and a write signal;
some have different means to address external circuits. The
difference is irrelevant at software level, however, and
we'll assume all have read and
write to simplify the discussion.


While some CPU manufacturers implement a single address space in
their chips, others decided that peripheral devices are different
from memory and, therefore, deserve a separate address space. Some
processors (most notably the x86 family) have separate
read and write electrical
lines for I/O ports and special CPU instructions to access ports.

Because peripheral devices are built to fit a peripheral bus, and the
most popular I/O buses are modeled on the personal computer, even
processors that do not have a separate address space for I/O ports
must fake reading and writing I/O ports when accessing some
peripheral devices, usually by means of external chipsets or extra
circuitry in the CPU core. The latter solution is common within tiny
processors meant for embedded use.

For the same reason, Linux implements the concept of I/O ports on all
computer platforms it runs on, even on platforms where the CPU
implements a single address space. The implementation of port access
sometimes depends on the specific make and model of the host computer
(because different models use different chipsets to map bus
transactions into memory address space).

Even if the peripheral bus has a separate address space for I/O
ports, not all devices map their registers to I/O ports. While use of
I/O ports is common for ISA peripheral boards, most PCI devices map
registers into a memory address region. This I/O memory approach is
generally preferred, because it doesn't require the
use of special-purpose processor instructions; CPU cores access
memory much more efficiently, and the compiler has much more freedom
in register allocation and addressing-mode selection when accessing
memory.


9.1.1. I/O Registers and Conventional Memory




Despite the strong
similarity between hardware
registers and memory, a programmer accessing I/O registers must be
careful to avoid being tricked by CPU (or compiler) optimizations
that can modify the expected I/O behavior.

The main difference between I/O registers and RAM is that I/O
operations have side effects, while memory operations have none: the
only effect of a memory write is storing a value to a location, and a
memory read returns the last value written there. Because memory
access speed is so critical to CPU performance, the no-side-effects
case has been optimized in several ways: values are cached and
read/write instructions are reordered.

The compiler can cache data values into CPU registers without writing
them to memory, and even if it stores them, both write and read
operations can operate on cache memory without ever reaching physical
RAM. Reordering can also happen both at the compiler level and at the
hardware level: often a sequence of instructions can be executed more
quickly if it is run in an order different from that which appears in
the program text, for example, to prevent interlocks in the RISC
pipeline. On CISC processors, operations that take a significant
amount of time can be executed concurrently with other, quicker ones.

These optimizations are transparent and benign when applied to
conventional memory (at least on uniprocessor systems), but they can
be fatal to correct I/O operations, because they interfere with those
"side effects" that are the main
reason why a driver accesses I/O registers. The processor cannot
anticipate a situation in which some other process (running on a
separate processor, or something happening inside an I/O controller)
depends on the order of memory access. The compiler or the CPU may
just try to outsmart you and reorder the operations you request; the
result can be strange errors that are very difficult to debug.
Therefore, a driver must ensure that no caching is performed and no
read or write reordering takes place when accessing registers.



The
problem with hardware caching is the easiest to face: the underlying
hardware is already configured (either automatically or by Linux
initialization code) to disable any hardware cache when accessing I/O
regions (whether they are memory or port regions).


The
solution to compiler optimization and hardware reordering is to place
a memory barrier between operations that must
be visible to the hardware (or to another processor) in a particular
order. Linux provides four macros to cover all possible ordering
needs:

#include <linux/kernel.h>

void barrier(void)


This function tells the compiler to insert a memory barrier but has
no effect on the hardware. Compiled code stores to memory all values
that are currently modified and resident in CPU registers, and
rereads them later when they are needed. A call to
barrier prevents compiler optimizations across
the barrier but leaves the hardware free to do its own reordering.


#include <asm/system.h>

void rmb(void);

void read_barrier_depends(void);

void wmb(void);

void mb(void);


These functions insert hardware memory barriers in the compiled
instruction flow; their actual instantiation is platform dependent.
An rmb (read memory barrier) guarantees that any
reads appearing before the barrier are completed prior to the
execution of any subsequent read. wmb guarantees
ordering in write operations, and the mb
instruction guarantees both. Each of these functions is a superset of
barrier.



read_barrier_depends is a special, weaker form
of read barrier. Whereas rmb prevents the
reordering of all reads across the barrier,
read_barrier_depends blocks only the reordering
of reads that depend on data from other reads. The distinction is
subtle, and it does not exist on all architectures. Unless you
understand exactly what is going on, and you have a reason to believe
that a full read barrier is exacting an excessive performance cost,
you should probably stick to using rmb.

void smp_rmb(void);

void smp_read_barrier_depends(void);

void smp_wmb(void);

void smp_mb(void);


These versions of the barrier macros insert hardware barriers only
when the kernel is compiled for SMP systems; otherwise, they all
expand to a simple barrier call.



A typical usage of memory barriers in a device driver
may have this sort of form:

writel(dev->registers.addr, io_destination_address);
writel(dev->registers.size, io_size);
writel(dev->registers.operation, DEV_READ);
wmb( );
writel(dev->registers.control, DEV_GO);

In this case, it is important to be sure that all of the device
registers controlling a particular operation have been properly set
prior to telling it to begin. The memory barrier enforces the
completion of the writes in the necessary order.



Because memory barriers affect
performance, they should be used only where they are really needed.
The different types of barriers can also have different performance
characteristics, so it is worthwhile to use the most specific type
possible. For example, on the x86 architecture, wmb(
)
currently does nothing, since writes outside the
processor are not reordered. Reads are reordered, however, so
mb( ) is slower than wmb(
)
.

It is worth noting that most of the other kernel primitives dealing
with synchronization, such as spinlock and
atomic_t operations, also function as memory
barriers. Also worthy of note is that some peripheral buses (such as
the PCI bus) have caching issues of their own; we discuss those when
we get to them in later chapters.




Some
architectures allow the efficient combination of an assignment and a
memory barrier. The kernel provides a few macros that perform this
combination; in the default case, they are defined as follows:

#define set_mb(var, value)  do {var = value; mb(  );}  while 0
#define set_wmb(var, value) do {var = value; wmb( );} while 0
#define set_rmb(var, value) do {var = value; rmb( );} while 0

Where appropriate, <asm/system.h> defines
these macros to use architecture-specific instructions that
accomplish the task more quickly. Note that
set_rmb is defined only by a small number of
architectures. (The use of a do...while construct
is a standard C idiom that causes the expanded macro to work as a
normal C statement in all contexts.)


    / 202