Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

8.1. The Real Story of kmalloc

The kmalloc allocation engine is a powerful tool
and easily learned because of its similarity to
malloc. The function is fast (unless it blocks)
and doesn't clear the memory it obtains; the
allocated region still holds its previous content.^[1] The allocated region is also
contiguous in physical memory. In the next few sections, we talk in
detail about kmalloc, so you can compare it with
the memory allocation techniques that we discuss later.

^[1] Among other things, this implies that you should explicitly
clear any memory that might be exposed to user space or written to a
device; otherwise, you risk disclosing information that should be
kept private.

8.1.1. The Flags Argument

Remember that the prototype for
kmalloc is:

#include <linux/slab.h>
void *kmalloc(size_t size, int flags);

The first argument to kmalloc is the size of the
block to be allocated. The second argument, the allocation flags, is
much more interesting, because it controls the behavior of
kmalloc in a number of ways.

The
most commonly used flag, GFP_KERNEL, means that
the allocation (internally performed by calling, eventually,
_ _get_free_pages, which is the source of the
GFP_ prefix) is performed on behalf of a process
running in kernel space. In other words, this means that the calling
function is executing a system call on behalf of a process. Using
GFP_KERNEL means that kmalloc
can put the current process to sleep waiting for a page when called
in low-memory situations. A function that allocates memory using
GFP_KERNEL must, therefore, be reentrant and
cannot be running in atomic context. While the current process
sleeps, the kernel takes proper action to locate some free memory,
either by flushing buffers to disk or by swapping out memory from a
user process.

GFP_KERNEL
isn't always the right allocation flag to use;
sometimes kmalloc is called from outside a
process's context. This type of call can happen, for
instance, in interrupt handlers, tasklets, and kernel timers. In this
case, the current process should not be put to
sleep, and the driver should use a flag of
GFP_ATOMIC instead. The kernel normally tries to
keep some free pages around in order to fulfill atomic allocation.
When GFP_ATOMIC is used,
kmalloc can use even the last free page. If that
last page does not exist, however, the allocation fails.

Other
flags can be used in place of or in addition to
GFP_KERNEL and GFP_ATOMIC,
although those two cover most of the needs of device drivers. All the
flags are defined in <linux/gfp.h>, and
individual flags are prefixed with a double underscore, such as
_ _GFP_DMA. In addition, there are symbols that
represent frequently used combinations of flags; these lack the
prefix and are sometimes called allocation
priorities.
The latter include:

GFP_ATOMIC

Used to allocate memory from interrupt handlers and other code
outside of a process context. Never sleeps.

GFP_KERNEL

Normal allocation of kernel memory. May sleep.

GFP_USER

Used
to allocate memory for user-space pages; it may sleep.

GFP_HIGHUSER

Like
GFP_USER, but allocates from high memory, if any.
High memory is described in the next subsection.

GFP_NOIO

GFP_NOFS

These flags function like GFP_KERNEL, but they add
restrictions on what the kernel can do to satisfy the request. A
GFP_NOFS allocation is not allowed to perform any
filesystem calls, while
GFP_NOIO

disallows the initiation of any I/O at all. They are used primarily
in the filesystem and virtual memory code where an allocation may be
allowed to sleep, but recursive filesystem calls would be a bad idea.

The allocation flags listed above can be augmented by an ORing in any
of the following flags, which change how the allocation is carried
out:

_ _GFP_DMA

This flag requests allocation to happen in the DMA-capable memory
zone. The exact meaning is platform-dependent and is explained in the
following section.

_ _GFP_HIGHMEM

This flag indicates that the allocated memory may be located in high
memory.

_ _GFP_COLD

Normally, the memory

allocator
tries to return "cache warm"
pagespages that are likely to be found in the processor cache.
Instead, this flag requests a
"cold" page, which has not been
used in some time. It is useful for allocating pages for DMA reads,
where presence in the processor cache is not useful. See
Chapter 15 for a full discussion
of how to allocate DMA buffers.

_ _GFP_NOWARN

This rarely used flag prevents the kernel from issuing warnings (with
printk) when an allocation cannot be satisfied.

_ _GFP_HIGH

This flag marks a high-priority request, which is allowed to consume
even the last pages of memory set aside by the kernel for
emergencies.

_ _GFP_REPEAT

_ _GFP_NOFAIL

_ _GFP_NORETRY

These flags modify how the allocator behaves when it has difficulty
satisfying an allocation. _ _GFP_REPEAT means
"try a little harder" by repeating
the attemptbut the allocation can still fail. The _ _GFP_NOFAIL flag tells the allocator never to fail; it
works as hard as needed to satisfy the request. Use of _ _GFP_NOFAIL is very strongly discouraged; there will
probably never be a valid reason to use it in a device driver.
Finally, _ _GFP_NORETRY tells the allocator to
give up immediately if the requested memory is not available.

8.1.1.1 Memory zones

Both
_ _GFP_DMA and _ _GFP_HIGHMEM
have a platform-dependent role, although their use is valid for all
platforms.

The Linux kernel knows about a minimum of
three memory zones: DMA-capable memory, normal
memory, and high memory. While allocation normally happens in the
normal zone, setting either of the bits just
mentioned requires memory to be allocated from a different zone. The
idea is that every computer platform that must know about special
memory ranges (instead of considering all RAM equivalents) will fall
into this abstraction.

DMA-capable memory is memory that lives in a
preferential address range, where peripherals can perform DMA access.
On most sane platforms, all memory lives in this zone. On the x86,
the DMA zone is used for the first 16 MB of RAM, where legacy ISA
devices can perform DMA; PCI devices have no such limit.

High memory is a mechanism used to allow access to
(relatively) large amounts of memory on 32-bit platforms. This memory
cannot be directly accessed from the kernel without first setting up
a special mapping and is generally harder to work with. If your
driver uses large amounts of memory, however, it will work better on
large systems if it can use high memory. See the Section 1.8 in Chapter 15 for a detailed
description of how high memory works and how to use it.

Whenever a new page is allocated to fulfill a memory allocation
request, the kernel builds a list of zones that can be used in the
search. If _ _GFP_DMA is specified, only the DMA
zone is searched: if no memory is available at low addresses,
allocation fails. If no special flag is present, both normal and DMA
memory are searched; if _ _GFP_HIGHMEM is set, all
three zones are used to search a free page. (Note, however, that
kmalloc cannot allocate high memory.)

The situation is more complicated on
nonuniform memory access (NUMA)
systems. As a general rule, the allocator attempts to locate memory
local to the processor performing the allocation, although there are
ways of changing that behavior.

The mechanism behind memory zones is
implemented in mm/page_alloc.c, while
initialization of the zone resides in platform-specific files,
usually in mm/init.c within the
arch tree. We'll revisit these
topics in Chapter 15.

8.1.2. The Size Argument

The kernel
manages
the system's
physical
memory, which is available only in
page-sized chunks. As a result, kmalloc looks
rather different from a typical user-space
malloc implementation. A simple, heap-oriented
allocation technique would quickly run into trouble; it would have a
hard time working around the page boundaries. Thus, the kernel uses a
special page-oriented allocation technique to get the best use from
the system's RAM.

Linux handles memory allocation by creating a set of pools of memory
objects of fixed sizes. Allocation requests are handled by going to a
pool that holds sufficiently large objects and handing an entire
memory chunk back to the requester. The memory management scheme is
quite complex, and the details of it are not normally all that
interesting to device driver writers.

The one thing driver developers should keep in mind, though, is that
the kernel can allocate only certain predefined, fixed-size byte
arrays. If you ask for an arbitrary amount of memory,
you're likely to get slightly more than you asked
for, up to twice as much. Also, programmers should remember that the
smallest allocation that kmalloc can handle is
as big as 32 or 64 bytes, depending on the page size used by the
system's architecture.

There is an upper limit to the size of memory chunks that can be
allocated by kmalloc. That limit varies
depending on architecture and kernel configuration options. If your
code is to be completely portable, it cannot count on being able to
allocate anything larger than 128 KB. If you need more than a few
kilobytes, however, there are better ways than
kmalloc to obtain memory, which we describe
later in this chapter.

Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی