Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Linux Device Drivers (3rd Edition) [Electronic resources] - نسخه متنی

Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








15.2. The mmap Device Operation


Memory mapping is one of the most interesting features of modern
Unix systems. As far as drivers are concerned, memory mapping can be
implemented to provide user programs with direct access to device
memory.

A definitive example of mmap usage can be seen
by looking at a subset of the virtual memory areas for the X Window
System server:

cat /proc/731/maps
000a0000-000c0000 rwxs 000a0000 03:01 282652 /dev/mem
000f0000-00100000 r-xs 000f0000 03:01 282652 /dev/mem
00400000-005c0000 r-xp 00000000 03:01 1366927 /usr/X11R6/bin/Xorg
006bf000-006f7000 rw-p 001bf000 03:01 1366927 /usr/X11R6/bin/Xorg
2a95828000-2a958a8000 rw-s fcc00000 03:01 282652 /dev/mem
2a958a8000-2a9d8a8000 rw-s e8000000 03:01 282652 /dev/mem
...

The full list of the X server's VMAs is lengthy, but
most of the entries are not of interest here. We do see, however,
four separate mappings of /dev/mem, which give
some insight into how the X server works with the video card. The
first mapping is at a0000, which is the standard
location for video RAM in the 640-KB ISA hole. Further down, we see a
large mapping at e8000000, an address which is
above the highest RAM address on the system. This is a direct mapping
of the
video memory on the adapter.

These regions can also be seen in /proc/iomem:

000a0000-000bffff : Video RAM area
000c0000-000ccfff : Video ROM
000d1000-000d1fff : Adapter ROM
000f0000-000fffff : System ROM
d7f00000-f7efffff : PCI Bus #01
e8000000-efffffff : 0000:01:00.0
fc700000-fccfffff : PCI Bus #01
fcc00000-fcc0ffff : 0000:01:00.0

Mapping a device means associating a range of user-space addresses to
device memory. Whenever the program reads or writes in the assigned
address range, it is actually accessing the device. In the X server
example, using mmap allows quick and easy access
to the video card's memory. For a
performance-critical application like this, direct access makes a
large difference.


As
you might suspect, not every device lends itself to the
mmap abstraction; it makes no sense, for
instance, for serial ports and other stream-oriented devices. Another
limitation of mmap is that mapping is
PAGE_SIZE grained. The kernel can manage virtual
addresses only at the level of page tables; therefore, the mapped
area must be a multiple of PAGE_SIZE and must live
in physical memory starting at an address that is a multiple of
PAGE_SIZE. The kernel forces size granularity by
making a region slightly bigger if its size isn't a
multiple of the page size.

These limits are not a big constraint for drivers, because the
program accessing the device is device dependent anyway. Since the
program must know about how the device works, the programmer is not
unduly bothered by the need to see to details like page alignment. A
bigger constraint exists when ISA devices are used on some non-x86
platforms, because their hardware view of ISA may not be contiguous.
For example, some Alpha computers see ISA memory as a scattered set
of 8-bit, 16-bit, or 32-bit items, with no direct mapping. In such
cases, you can't use mmap at
all. The inability to perform direct mapping of ISA addresses to
Alpha addresses is due to the incompatible data transfer
specifications of the two systems. Whereas early Alpha processors
could issue only 32-bit and 64-bit memory accesses, ISA can do only
8-bit and 16-bit transfers, and there's no way to
transparently map one protocol onto the other.

There are sound advantages to using
mmap when it's feasible to do
so. For instance, we have already looked at the X server, which
transfers a lot of data to and from video memory; mapping the graphic
display to user space dramatically improves the throughput, as
opposed to an lseek/write
implementation. Another typical example is a program controlling a
PCI device. Most PCI peripherals map their control registers to a
memory address, and a high-performance application might prefer to
have direct access to the registers instead of repeatedly having to
call ioctl to get its work done.


The
mmap method is part of the
file_operations structure and is invoked when the
mmap system call is issued. With
mmap, the kernel performs a good deal of work
before the actual method is invoked, and, therefore, the prototype of
the method is quite different from that of the system call. This is
unlike calls such as ioctl and
poll, where the kernel does not do much before
calling the method.

The system call is declared as follows (as described in the
mmap(2) manual page):

mmap (caddr_t addr, size_t len, int prot, int flags, int fd, off_t offset)

On the other hand, the file operation is declared as:

int (*mmap) (struct file *filp, struct vm_area_struct *vma);

The filp argument in the method is the same as
that introduced in Chapter 3,
while vma contains the information about the
virtual address range that is used to access the device. Therefore,
much of the work has been done by the kernel; to implement
mmap, the driver only has to build suitable page
tables for the address range and, if necessary, replace
vma->vm_ops with a new set of operations.

There are two ways of building the page tables: doing it all at once
with a function called remap_pfn_range or doing it
a page at a time via the nopage VMA method. Each
method has its advantages and limitations. We start with the
"all at once" approach, which is
simpler. From there, we add the complications needed for a real-world
implementation.


15.2.1. Using remap_pfn_range


The job of building new page
tables
to map a range of physical addresses is handled by
remap_pfn_range and
io_remap_page_range, which have the following
prototypes:

int remap_pfn_range(struct vm_area_struct *vma, 
unsigned long virt_addr, unsigned long pfn,
unsigned long size, pgprot_t prot);
int io_remap_page_range(struct vm_area_struct *vma,
unsigned long virt_addr, unsigned long phys_addr,
unsigned long size, pgprot_t prot);

The value returned by the function is the usual 0
or a negative error code. Let's look at the exact
meaning of the function's arguments:

vma


The virtual memory area into which the page range is being mapped.


virt_addr


The user virtual address where remapping should begin. The function
builds page tables for the virtual address range between
virt_addr and virt_addr+size.


pfn


The page frame number corresponding to the physical address to which
the virtual address should be mapped. The page frame number is simply
the physical address right-shifted by PAGE_SHIFT
bits. For most uses, the vm_pgoff field of the VMA
structure contains exactly the value you need. The function affects
physical addresses from (pfn<<PAGE_SHIFT) to
(pfn<<PAGE_SHIFT)+size.


size


The dimension, in bytes, of the area being remapped.


prot


The "protection" requested for the
new VMA. The driver can (and should) use the value found in
vma->vm_page_prot.




The arguments to
remap_pfn_range are fairly straightforward, and
most of them are already provided to you in the VMA when your
mmap method is called. You may be wondering why
there are two functions, however. The first
(remap_pfn_range) is intended for situations
where pfn refers to actual system RAM, while
io_remap_page_range should be used when
phys_addr points to I/O memory. In practice, the
two functions are identical on every architecture except the SPARC,
and you see remap_pfn_range used in most
situations. In the interest of writing portable drivers, however, you
should use the variant of remap_pfn_range that
is suited to your particular situation.

One other complication has to do with caching: usually, references to
device memory should not be cached by the processor. Often the system
BIOS sets things up properly, but it is also possible to disable
caching of specific VMAs via the protection field. Unfortunately,
disabling caching at this level is highly processor dependent. The
curious reader may wish to look at the
pgprot_noncached function from
drivers/char/mem.c to see
what's involved. We won't discuss
the topic further here.


15.2.2. A Simple Implementation


If your driver needs to do a simple, linear mapping of device memory
into a user address space, remap_pfn_range is
almost all you really need to do the job. The following code is
derived from drivers/char/mem.c and shows how
this task is performed in a typical module called
simple (Simple Implementation Mapping Pages with
Little Enthusiasm):

static int simple_remap_mmap(struct file *filp, struct vm_area_struct *vma)
{
if (remap_pfn_range(vma, vma->vm_start, vm->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
return -EAGAIN;
vma->vm_ops = &simple_remap_vm_ops;
simple_vma_open(vma);
return 0;
}

As you can see, remapping memory just a matter of calling
remap_pfn_range to create the necessary page
tables.


15.2.3. Adding VMA Operations


As
we have seen, the vm_area_struct structure
contains a set of operations that may be applied to the VMA.


Now we look at providing those
operations in a simple way. In particular, we provide
open and close operations
for our VMA. These operations are called whenever a process opens or
closes the VMA; in particular, the open method
is invoked anytime a process forks and creates a new reference to the
VMA. The open and close VMA
methods are called in addition to the processing performed by the
kernel, so they need not reimplement any of the work done there. They
exist as a way for drivers to do any additional processing that they
may require.

As it turns out, a simple driver such as simple
need not do any extra processing in particular. So we have created
open and close methods,
which print a message to the system log informing the world that they
have been called. Not particularly useful, but it does allow us to
show how these methods can be provided, and see when they are
invoked.

To this end, we override the default
vma->vm_ops with operations that call
printk:

void simple_vma_open(struct vm_area_struct *vma)
{
printk(KERN_NOTICE "Simple VMA open, virt %lx, phys %lx\n",
vma->vm_start, vma->vm_pgoff << PAGE_SHIFT);
}
void simple_vma_close(struct vm_area_struct *vma)
{
printk(KERN_NOTICE "Simple VMA close.\n");
}
static struct vm_operations_struct simple_remap_vm_ops = {
.open = simple_vma_open,
.close = simple_vma_close,
};

To make these operations active for a specific mapping, it is
necessary to store a pointer to
simple_remap_vm_ops in the
vm_ops field of the relevant VMA. This is usually
done in the mmap method. If you turn back to the
simple_remap_mmap example, you see these lines
of code:

vma->vm_ops = &simple_remap_vm_ops;
simple_vma_open(vma);

Note the explicit call to simple_vma_open. Since
the open method is not invoked on the initial
mmap, we must call it explicitly if we want it
to run.


15.2.4. Mapping Memory with nopage


Although remap_pfn_range works

well for many, if not most, driver
mmap implementations, sometimes it is necessary
to be a little more flexible. In such situations, an implementation
using the nopage VMA method may be called for.

One situation in which the
nopage approach is useful can be brought about
by the mremap system call, which is used by
applications to change the bounding addresses of a mapped region. As
it happens, the kernel does not notify drivers directly when a mapped
VMA is changed by mremap. If the VMA is reduced
in size, the kernel can quietly flush out the unwanted pages without
telling the driver. If, instead, the VMA is expanded, the driver
eventually finds out by way of calls to nopage
when mappings must be set up for the new pages, so there is no need
to perform a separate notification. The nopage
method, therefore, must be implemented if you want to support the
mremap system call. Here, we show a simple
implementation of nopage for the
simple device.

The nopage

method, remember, has the following prototype:

struct page *(*nopage)(struct vm_area_struct *vma, 
unsigned long address, int *type);


When
a user process attempts to access a page in a VMA that is not present
in memory, the associated nopage function is
called. The address parameter contains the virtual
address that caused the fault, rounded down to the beginning of the
page. The nopage function must locate and return
the struct page pointer that refers to the page
the user wanted. This function must also take care to increment the
usage count for the page it returns by calling the
get_page macro:

 get_page(struct page *pageptr);

This step is necessary to keep the reference counts correct on the
mapped pages. The kernel maintains this count for every page; when
the count goes to 0, the kernel knows that the
page may be placed on the free list. When a VMA is unmapped, the
kernel decrements the usage count for every page in the area. If your
driver does not increment the count when adding a page to the area,
the usage count becomes 0 prematurely, and the
integrity of the system is compromised.

The nopage method should also store the type of
fault in the location pointed to by the type
argumentbut only if that argument is not
NULL. In device drivers, the proper value for
type will invariably be
VM_FAULT_MINOR.

If you are using nopage, there is usually very
little work to be done when mmap is called; our
version looks like this:

static int simple_nopage_mmap(struct file *filp, struct vm_area_struct *vma)
{
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
if (offset >= _ _pa(high_memory) || (filp->f_flags & O_SYNC))
vma->vm_flags |= VM_IO;
vma->vm_flags |= VM_RESERVED;
vma->vm_ops = &simple_nopage_vm_ops;
simple_vma_open(vma);
return 0;
}

The main thing mmap has to do is to replace the
default (NULL) vm_ops pointer
with our own operations. The nopage method then
takes care of "remapping" one page
at a time and returning the address of its struct
page structure. Because we are just implementing a
window onto physical memory here, the remapping step is simple: we
only need to locate and return a pointer to the
struct page for the desired
address. Our nopage method looks like the
following:

struct page *simple_vma_nopage(struct vm_area_struct *vma,
unsigned long address, int *type)
{
struct page *pageptr;
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
unsigned long physaddr = address - vma->vm_start + offset;
unsigned long pageframe = physaddr >> PAGE_SHIFT;
if (!pfn_valid(pageframe))
return NOPAGE_SIGBUS;
pageptr = pfn_to_page(pageframe);
get_page(pageptr);
if (type)
*type = VM_FAULT_MINOR;
return pageptr;
}

Since, once again, we are simply mapping main memory here, the
nopage function need only find the correct
struct page for the faulting
address and increment its reference count. Therefore, the required
sequence of events is to calculate the desired physical address, and
turn it into a page frame number by right-shifting it
PAGE_SHIFT bits. Since user space can give us any
address it likes, we must ensure that we have a valid page frame; the
pfn_valid function does that for us. If the
address is out of range, we return NOPAGE_SIGBUS,
which causes a bus signal to be delivered to the calling process.
Otherwise, pfn_to_page gets the necessary
struct page pointer; we can
increment its reference count (with a call to
get_page) and return it.

The nopage method normally returns a pointer to
a struct page. If, for some reason, a normal page
cannot be returned (e.g., the requested address is beyond the
device's memory region),
NOPAGE_SIGBUS can be returned to signal the error;
that is what the simple code above does.
nopage can also return
NOPAGE_OOM to indicate failures caused by resource
limitations.

Note that this implementation works for ISA memory regions but not
for those on the PCI bus. PCI memory is mapped above the highest
system memory, and there are no entries in the system memory map for
those addresses. Because there is no struct page
to return a pointer to, nopage cannot be used in
these situations; you must use remap_pfn_range
instead.

If the nopage method is left
NULL, kernel code that handles page faults maps
the zero page to the faulting virtual address. The zero
page
is a copy-on-write page that reads as
0 and that is used, for example, to map the BSS
segment. Any process referencing the zero page sees exactly that: a
page filled with zeroes. If the process writes to the page, it ends
up modifying a private copy. Therefore, if a process extends a mapped
region by calling mremap, and the driver
hasn't implemented nopage, the
process ends up with zero-filled memory instead of a segmentation
fault.


15.2.5. Remapping Specific I/O Regions


All the examples we've seen
so
far are reimplementations of
/dev/mem; they remap physical addresses into
user space. The typical driver, however, wants to map only the small
address range that applies to its peripheral device, not all memory.
In order to map to user space only a subset of the whole memory
range, the driver needs only to play with the offsets. The following
does the trick for a driver mapping a region of
simple_region_size bytes, beginning at physical
address simple_region_start (which should be
page-aligned):

unsigned long off = vma->vm_pgoff << PAGE_SHIFT;
unsigned long physical = simple_region_start + off;
unsigned long vsize = vma->vm_end - vma->vm_start;
unsigned long psize = simple_region_size - off;
if (vsize > psize)
return -EINVAL; /* spans too high */
remap_pfn_range(vma, vma_>vm_start, physical, vsize, vma->vm_page_prot);

In addition to calculating the offsets, this code introduces a check
that reports an error when the program tries to map more memory than
is available in the I/O region of the target device. In this code,
psize is the physical I/O size that is left after
the offset has been specified, and vsize is the
requested size of virtual memory; the function refuses to map
addresses that extend beyond the allowed memory range.

Note that the user process can always
use mremap to extend its mapping, possibly past
the end of the physical device area. If your driver fails to define a
nopage method, it is never notified of this
extension, and the additional area maps to the zero page. As a driver
writer, you may well want to prevent this sort of behavior; mapping
the zero page onto the end of your region is not an explicitly bad
thing to do, but it is highly unlikely that the programmer wanted
that to happen.

The simplest way to prevent extension of the mapping is to implement
a simple nopage method that always causes a bus
signal to be sent to the faulting process. Such a method would look
like this:

struct page *simple_nopage(struct vm_area_struct *vma,
unsigned long address, int *type);
{ return NOPAGE_SIGBUS; /* send a SIGBUS */}

As we have seen, the nopage method is called
only when the process dereferences an address that is within a known
VMA but for which there is currently no valid page table entry. If we
have used remap_pfn_range to map the entire
device region, the nopage method shown here is
called only for references outside of that region. Thus, it can
safely return NOPAGE_SIGBUS to signal an error. Of
course, a more thorough implementation of nopage
could check to see whether the faulting address is within the device
area, and perform the remapping if that is the case. Once again,
however, nopage does not work with PCI memory
areas, so extension of PCI mappings is not possible.


15.2.6. Remapping RAM


An interesting limitation of
remap_pfn_range


is that it gives access only to reserved
pages and physical addresses above the top of physical memory. In
Linux, a page of physical addresses is marked as
"reserved" in the memory map to
indicate that it is not available for memory management. On the PC,
for example, the range between 640 KB and 1 MB is marked as reserved,
as are the pages that host the kernel code itself. Reserved pages are
locked in memory and are the only ones that can be safely mapped to
user space; this limitation is a basic requirement for system
stability.

Therefore, remap_pfn_range
won't allow you to remap conventional addresses,
which include the ones you obtain by calling
get_free_page. Instead, it maps in the zero
page. Everything appears to work, with the exception that the process
sees private, zero-filled pages rather than the remapped RAM that it
was hoping for. Nonetheless, the function does everything that most
hardware drivers need it to do, because it can remap high PCI buffers
and ISA memory.


The
limitations of remap_pfn_range can be seen by
running mapper, one of the sample programs in
misc-progs in the files provided on
O'Reilly's FTP site.
mapper is a simple tool that can be used to
quickly test the mmap system call; it maps
read-only parts of a file specified by command-line options and dumps
the mapped region to standard output. The following session, for
instance, shows that /dev/mem
doesn't map the physical page located at address 64
KBinstead, we see a page full of zeros (the host computer in
this example is a PC, but the result would be the same on other
platforms):

morgana.root# ./mapper /dev/mem 0x10000 0x1000 | od -Ax -t x1
mapped "/dev/mem" from 65536 to 69632
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
001000

The inability of remap_pfn_range to deal with
RAM suggests that memory-based devices like
scull can't easily implement
mmap, because its device memory is conventional
RAM, not I/O memory. Fortunately, a relatively easy workaround is
available to any driver that needs to map RAM into user space; it
uses the nopage method that we have seen
earlier.


15.2.6.1 Remapping RAM with the nopage method

The way to map real RAM to user

space is to use vm_ops->nopage to deal with
page faults one at a time. A sample implementation is part of the
scullp module, introduced in Chapter 8.

scullp is a page-oriented char device. Because
it is page oriented, it can implement mmap on
its memory. The code implementing memory mapping uses some of the
concepts introduced in Section 15.1.

Before examining the code, let's look at the design
choices that affect the mmap implementation in
scullp:

  • scullp doesn't release device
    memory as long as the device is mapped. This is a matter of policy
    rather than a requirement, and it is different from the behavior of
    scull and similar devices, which are truncated
    to a length of 0 when opened for writing. Refusing
    to free a mapped scullp device allows a process
    to overwrite regions actively mapped by another process, so you can
    test and see how processes and device memory interact. To avoid
    releasing a mapped device, the driver must keep a count of active
    mappings; the vmas field in the device structure
    is used for this purpose.

  • Memory mapping is performed only when the scullp
    order parameter (set at module load time) is
    0. The parameter controls how _
    _get_free_pages
    is invoked (see Section 8.3).
    The zero-order limitation
    (which forces pages to be allocated one at a time, rather than in
    larger groups) is dictated by the internals of _
    _get_free_pages
    , the allocation function used by
    scullp. To maximize allocation performance, the
    Linux kernel maintains a list of free pages for each allocation
    order, and only the reference count of the first page in a cluster is
    incremented by get_free_pages and decremented by
    free_pages. The mmap method
    is disabled for a scullp device if the
    allocation order is greater than zero, because
    nopage deals with single pages rather than
    clusters of pages. scullp simply does not know
    how to properly manage reference counts for pages that are part of
    higher-order allocations. (Return to Section 8.3.1
    if you need a refresher on
    scullp and the memory allocation order value.)


The zero-order limitation is mostly
intended to keep the code simple. It is possible
to correctly implement mmap for multipage
allocations by playing with the usage count of the pages, but it
would only add to the complexity of the example without introducing
any interesting information.

Code that is intended to map RAM according to the rules just outlined
needs to implement the open,
close, and nopage VMA
methods; it also needs to access the memory map to adjust the page
usage counts.

This implementation of scullp_mmap is very
short, because it relies on the nopage function
to do all the interesting work:

int scullp_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct inode *inode = filp->f_dentry->d_inode;
/* refuse to map if order is not 0 */
if (scullp_devices[iminor(inode)].order)
return -ENODEV;
/* don't do anything here: "nopage" will fill the holes */
vma->vm_ops = &scullp_vm_ops;
vma->vm_flags |= VM_RESERVED;
vma->vm_private_data = filp->private_data;
scullp_vma_open(vma);
return 0;
}

The purpose of the if statement is to avoid
mapping devices whose allocation order is not 0.
scullp's operations are stored
in the vm_ops field, and a pointer to the device
structure is stashed in the vm_private_data field.
At the end, vm_ops->open is called to update
the count of active mappings for the device.

open and close simply keep
track of the mapping count and are defined as follows:

void scullp_vma_open(struct vm_area_struct *vma)
{
struct scullp_dev *dev = vma->vm_private_data;
dev->vmas++;
}
void scullp_vma_close(struct vm_area_struct *vma)
{
struct scullp_dev *dev = vma->vm_private_data;
dev->vmas--;
}

Most of the work is then performed by nopage. In
the scullp implementation, the
address parameter to nopage
is used to calculate an offset into the device; the offset is then
used to look up the correct page in the scullp
memory tree:

struct page *scullp_vma_nopage(struct vm_area_struct *vma,
unsigned long address, int *type)
{
unsigned long offset;
struct scullp_dev *ptr, *dev = vma->vm_private_data;
struct page *page = NOPAGE_SIGBUS;
void *pageptr = NULL; /* default to "missing" */
down(&dev->sem);
offset = (address - vma->vm_start) + (vma->vm_pgoff << PAGE_SHIFT);
if (offset >= dev->size) goto out; /* out of range */
/*
* Now retrieve the scullp device from the list,then the page.
* If the device has holes, the process receives a SIGBUS when
* accessing the hole.
*/
offset >>= PAGE_SHIFT; /* offset is a number of pages */
for (ptr = dev; ptr && offset >= dev->qset;) {
ptr = ptr->next;
offset -= dev->qset;
}
if (ptr && ptr->data) pageptr = ptr->data[offset];
if (!pageptr) goto out; /* hole or end-of-file */
page = virt_to_page(pageptr);
/* got it, now increment the count */
get_page(page);
if (type)
*type = VM_FAULT_MINOR;
out:
up(&dev->sem);
return page;
}

scullp uses memory obtained with
get_free_pages. That memory is addressed using
logical addresses, so all scullp_nopage has to
do to get a struct page pointer
is to call virt_to_page.

The scullp device now works as expected, as you
can see in this sample output from the mapper
utility. Here, we send a directory listing of
/dev (which is long) to the
scullp device and then use the
mapper utility to look at pieces of that listing
with mmap:

morgana% ls -l /dev > /dev/scullp
morgana% ./mapper /dev/scullp 0 140
mapped "/dev/scullp" from 0 (0x00000000) to 140 (0x0000008c)
total 232
crw------- 1 root root 10, 10 Sep 15 07:40 adbmouse
crw-r--r-- 1 root root 10, 175 Sep 15 07:40 agpgart
morgana% ./mapper /dev/scullp 8192 200
mapped "/dev/scullp" from 8192 (0x00002000) to 8392 (0x000020c8)
d0h1494
brw-rw---- 1 root floppy 2, 92 Sep 15 07:40 fd0h1660
brw-rw---- 1 root floppy 2, 20 Sep 15 07:40 fd0h360
brw-rw---- 1 root floppy 2, 12 Sep 15 07:40 fd0H360


15.2.7. Remapping Kernel Virtual Addresses


Although it's rarely


necessary,
it's interesting to see how a driver can map a
kernel virtual address to user space using mmap.
A true kernel virtual address, remember, is an address returned by a
function such as
vmallocthat is, a virtual address mapped
in the kernel page tables. The code in this section is taken from
scullv, which is the module that works like
scullp but allocates its storage through
vmalloc.

Most of the scullv implementation is like the
one we've just seen for scullp,
except that there is no need to check the order
parameter that controls memory allocation. The reason for this is
that vmalloc allocates its pages one at a time,
because single-page allocations are far more likely to succeed than
multipage allocations. Therefore, the allocation order problem
doesn't apply to vmalloced
space.

Beyond that, there is only one difference between the
nopage implementations used by
scullp and scullv. Remember
that scullp, once it found the page of interest,
would obtain the corresponding struct
page pointer with
virt_to_page. That function does not work with
kernel virtual addresses, however. Instead, you must use
vmalloc_to_page. So the final part of the
scullv version of nopage
looks like:

  /*
* After scullv lookup, "page" is now the address of the page
* needed by the current process. Since it's a vmalloc address,
* turn it into a struct page.
*/
page = vmalloc_to_page(pageptr);
/* got it, now increment the count */
get_page(page);
if (type)
*type = VM_FAULT_MINOR;
out:
up(&dev->sem);
return page;

Based on this discussion, you might also want to map addresses
returned by ioremap to user space. That would be
a mistake, however; addresses from ioremap are
special and cannot be treated like normal kernel virtual addresses.
Instead, you should use remap_pfn_range to remap
I/O memory areas into user space.


    / 202