Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

4.5. Debugging System Faults

Even if you've used all the

monitoring and debugging techniques,
sometimes bugs remain in the driver, and the system faults when the
driver is executed. When this happens, it's
important to be able to collect as much information as possible to
solve the problem.

Note that "fault"
doesn't mean
"panic." The Linux code is robust
enough to respond gracefully to most errors: a fault usually results
in the destruction of the current process while the system goes on
working. The system can panic, and it may if a
fault happens outside of a process's context or if
some vital part of the system is compromised. But when the problem is
due to a driver error, it usually results only in the sudden death of
the process unlucky enough to be using the driver. The only
unrecoverable damage when a process is destroyed is that some memory
allocated to the process's context is lost; for
instance, dynamic lists allocated by the driver through
kmalloc might be lost. However, since the kernel
calls the close operation for any open device
when a process dies, your driver can release what was allocated by
the open method.

Even though an oops usually does not bring down the entire system,
you may well find yourself needing to reboot after one happens. A
buggy driver can leave hardware in an unusable state, leave kernel
resources in an inconsistent state, or, in the worst case, corrupt
kernel memory in random places. Often you can simply unload your
buggy driver and try again after an oops. If, however, you see
anything that suggests that the system as a whole is not well, your
best bet is usually to reboot immediately.

We've already said that when kernel code misbehaves,
an informative message is printed on the console. The next section
explains how to decode and use such messages. Even though they appear
rather obscure to the novice, processor dumps are full of interesting
information, often sufficient to pinpoint a program bug without the
need for additional testing.

4.5.1. Oops Messages

Most bugs show themselves in

NULL pointer dereferences or by the use of other
incorrect pointer values. The usual outcome of such bugs is an oops
message.

Almost any address used by
the processor is a virtual address and is mapped to physical
addresses through a complex structure of page tables (the exceptions
are physical addresses used with the memory management subsystem
itself). When an invalid pointer is dereferenced, the paging
mechanism fails to map the pointer to a physical address, and the
processor signals a page fault to the operating
system. If the address is not valid, the kernel is not able to
"page in" the missing address; it
(usually) generates an oops if this happens while the processor is in
supervisor mode.

An oops displays the processor status at the time of the fault,
including the contents of the CPU registers and other seemingly
incomprehensible information. The message is generated by
printk statements in the fault handler
(arch/*/kernel/traps.c) and is dispatched as
described earlier in Section 4.2.1).

Let's look at one such message.
Here's what results from dereferencing a
NULL pointer on a PC running Version 2.6 of the
kernel. The most relevant information here is the instruction pointer
(EIP), the address of the faulty instruction.

Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
d083a064
Oops: 0002 [#1]
SMP 
CPU:    0
EIP:    0060:[<d083a064>]    Not tainted
EFLAGS: 00010246   (2.6.6) 
EIP is at faulty_write+0x4/0x10 [faulty]
eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: cf8b2460   edi: cf8b2480   ebp: 00000005   esp: c31c5f74
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 2086, threadinfo=c31c4000 task=cfa0a6c0)
Stack: c0150558 cf8b2460 080e9408 00000005 cf8b2480 00000000 cf8b2460 cf8b2460 
fffffff7 080e9408 c31c4000 c0150682 cf8b2460 080e9408 00000005 cf8b2480 
00000000 00000001 00000005 c0103f8f 00000001 080e9408 00000005 00000005 
Call Trace:
[<c0150558>] vfs_write+0xb8/0x130
[<c0150682>] sys_write+0x42/0x70
[<c0103f8f>] syscall_call+0x7/0xb
Code: 89 15 00 00 00 00 c3 90 8d 74 26 00 83 ec 0c b8 00 a6 83 d0

This message was generated by writing to a device owned by the
faulty
module, a module built deliberately
to demonstrate failures. The implementation of the
write
method of
faulty.c is trivial:

ssize_t faulty_write (struct file *filp, const char _ _user *buf, size_t count,
loff_t *pos)
{
/* make a simple fault by dereferencing a NULL pointer */
*(int *)0 = 0;
return 0;
}

As you can see, what we do here is dereference a
NULL pointer. Since 0 is never
a valid pointer value, a fault occurs, which the kernel turns into
the oops message shown earlier. The calling process is then killed.

The faulty module has a different fault
condition in its
read
implementation:

ssize_t faulty_read(struct file *filp, char _ _user *buf,
size_t count, loff_t *pos)
{
int ret;
char stack_buf[4];
/* Let's try a buffer overflow  */
memset(stack_buf, 0xff, 20);
if (count > 4)
count = 4; /* copy 4 bytes to the user */
ret = copy_to_user(buf, stack_buf, count);
if (!ret)
return count;
return ret;
}

This method copies a string into a
local variable; unfortunately, the string is longer than the
destination array. The resulting buffer overflow causes an oops when
the function returns. Since the return instruction
brings the instruction pointer to nowhere land, this kind of fault is
much harder to trace, and you can get something such as the
following:

EIP:    0010:[<00000000>]
Unable to handle kernel paging request at virtual address ffffffff
printing eip:
ffffffff
Oops: 0000 [#5]
SMP 
CPU:    0
EIP:    0060:[<ffffffff>]    Not tainted
EFLAGS: 00010296   (2.6.6) 
EIP is at 0xffffffff
eax: 0000000c   ebx: ffffffff   ecx: 00000000   edx: bfffda7c
esi: cf434f00   edi: ffffffff   ebp: 00002000   esp: c27fff78
ds: 007b   es: 007b   ss: 0068
Process head (pid: 2331, threadinfo=c27fe000 task=c3226150)
Stack: ffffffff bfffda70 00002000 cf434f20 00000001 00000286 cf434f00 fffffff7 
bfffda70 c27fe000 c0150612 cf434f00 bfffda70 00002000 cf434f20 00000000 
00000003 00002000 c0103f8f 00000003 bfffda70 00002000 00002000 bfffda70 
Call Trace:
[<c0150612>] sys_read+0x42/0x70
[<c0103f8f>] syscall_call+0x7/0xb
Code:  Bad EIP value.

In this case, we see only part of the call stack
(vfs_read and faulty_read
are missing), and the kernel complains about a "bad
EIP value." That complaint, and the offending
address (ffffffff) listed at the beginning are
both hints that the kernel stack has been corrupted.

In general, when you are confronted with an oops, the first thing to
do is to look at the location where the problem happened, which is
usually listed separately from the call stack. In the first oops
shown above, the relevant line is:

EIP is at faulty_write+0x4/0x10 [faulty]

Here we see that we were in the function
faulty_write
,
which is located in the faulty module (which is
listed in square brackets). The hex numbers indicate that the
instruction pointer was 4 bytes into the function, which appears to
be 10 (hex) bytes long. Often that is enough to figure out what the
problem is.

If you need more information, the call stack shows you how you got to
where things fell apart. The stack itself is printed in hex form;
with a bit of work, you can often determine the values of local
variables and function parameters from the stack listing. Experienced
kernel developers can benefit from a certain amount of pattern
recognition here; for example, if we look at the stack listing from
the faulty_read

oops:

Stack: ffffffff bfffda70 00002000 cf434f20 00000001 00000286 cf434f00 fffffff7 
bfffda70 c27fe000 c0150612 cf434f00 bfffda70 00002000 cf434f20 00000000 
00000003 00002000 c0103f8f 00000003 bfffda70 00002000 00002000 bfffda70

The ffffffff at the top of the stack is part of
our string that broke things. On the x86 architecture, by default,
the user-space stack starts just below 0xc0000000;
thus, the recurring value 0xbfffda70 is probably a
user-space stack address; it is, in fact, the address of the buffer
passed to the read system call, replicated each
time it is passed down the kernel call chain. On the x86 (again, by
default), kernel space starts at 0xc0000000, so
values above that are almost certainly kernel-space addresses, and so
on.

Finally, when looking at oops listings, always be on the lookout for
the "slab poisoning" values
discussed at the beginning of this chapter. Thus, for example, if you
get a kernel oops where the offending address is
0xa5a5a5a5, you are almost certainly forgetting to
initialize dynamic memory somewhere.

Please note that you see a symbolic call stack (as shown above) only
if your kernel is built with the CONFIG_KALLSYMS
option turned on. Otherwise, you see a bare, hexadecimal listing,
which is far less useful until you have decoded it in
other ways.

4.5.2. System Hangs

Although most bugs in kernel code end up
as oops messages, sometimes they can completely hang the system. If
the system hangs, no message is printed. For example, if the code
enters an endless loop, the kernel stops scheduling,^[3] and the system
doesn't respond to any action, including the magic
Ctrl-Alt-Del combination. You have two choices for dealing with
system hangseither prevent them beforehand or be able to debug
them after the fact.

^[3] Actually, multiprocessor systems still schedule on the other
processors, and even a uniprocessor machine might reschedule if
kernel preemption is enabled. For the most common case (uniprocessor
with preemption disabled), however, the system stops scheduling
altogether.

You can prevent an endless
loop by inserting schedule invocations at
strategic points. The schedule call (as you
might guess) invokes the scheduler and, therefore, allows other
processes to steal CPU time from the current process. If a process is
looping in kernel space due to a bug in your driver, the
schedule calls enable you to kill the process
after tracing what is happening.

You should be aware, of course, that
any call to schedule may create an additional
source of reentrant calls to your driver, since it allows other
processes to run. This reentrancy should not normally be a problem,
assuming that you have used suitable locking in your driver. Be sure,
however, not to call schedule any time that your
driver is holding a spinlock.

If your driver really hangs the system,
and you don't know where to insert
schedule calls, the best way to go may be to add
some print messages and write them to the console (by changing the
console_loglevel value if need be).

Sometimes the system may appear to be
hung, but it isn't. This can happen, for example, if
the keyboard remains locked in some strange way. These false hangs
can be detected by looking at the output of a program you keep
running for just this purpose. A clock or system load meter on your
display is a good status monitor; as long as it continues to update,
the scheduler is working.

An indispensable tool for many lockups is
the "magic SysRq key," which is
available on most architectures. Magic SysRq is invoked with the
combination of the Alt and SysRq keys on the PC keyboard, or with
other special keys on other platforms (see
Documentation/sysrq.txt for details), and is
available on the serial console as well. A third key, pressed along
with these two, performs one of a number of useful actions:

r

Turns off keyboard raw mode; useful in situations where a crashed
application (such as the X server) may have left your keyboard in a
strange state.

k

Invokes the "

secure attention key"
(SAK) function. SAK kills all processes running on the current
console, leaving you with a clean terminal.

s

Performs an emergency synchronization of all disks.

u

Umount. Attempts to remount all disks in a read-only mode. This
operation, usually invoked immediately after s,
can save a lot of filesystem checking time in cases where the system
is in serious trouble.

b

Boot. Immediately reboots the system. Be sure to synchronize and
remount the disks first.

p

Prints processor registers information.

t

Prints the current task list.

m

Prints memory information.

Other magic SysRq functions exist; see sysrq.txt
in the Documentation directory of the kernel
source for the full list. Note that magic SysRq must be explicitly
enabled in the kernel configuration and that most distributions do
not enable it, for obvious security reasons. For a system used to
develop drivers, however, enabling magic SysRq is worth the trouble
of building a new kernel in itself. Magic SysRq may be disabled at
runtime with a command such as the following:

echo 0 > /proc/sys/kernel/sysrq

You should consider disabling it if unprivileged users can reach your
system keyboard, to prevent accidental or willing damages. Some
previous kernel versions had sysrq disabled by
default, so you needed to enable it at runtime by writing 1 to that
same /proc/sys file.

The

sysrq operations are exceedingly useful, so they
have been made available to system administrators who
can't reach the console. The file
/proc/sysrq-trigger is a write-only entry point,
where you can trigger a specific sysrq action by
writing the associated command character; you can then collect any
output data from the kernel logs. This entry point to
sysrq is always working, even if
sysrq is disabled on the console.

If you are experiencing a "live
hang," in which your driver is stuck in a loop but
the system as a whole is still functioning, there are a couple of
techniques worth knowing. Often, the SysRq p
function points the finger directly at the guilty routine. Failing
that, you can also use the kernel profiling function. Build a kernel
with profiling enabled, and boot it with profile=2
on the command line. Reset the profile counters with the
readprofile utility, then send your driver into
its loop. After a little while, use readprofile
again to see where the kernel is spending its time. Another more
advanced alternative is oprofile, that you may
consider as well. The file
Documentation/basic_profiling.txt tells you
everything you need to know to get started with the profilers.

One precaution worth using when chasing system hangs is to mount all
your disks as read-only (or unmount them). If the disks are read-only
or unmounted, there's no risk of damaging the
filesystem or leaving it in an inconsistent state. Another
possibility is using a computer that mounts all of its filesystems
via NFS, the network file system. The
"NFS-Root" capability must be
enabled in the kernel, and special parameters must be passed at boot
time. In this case, you'll avoid filesystem
corruption without even resorting to SysRq, because filesystem
coherence is managed by the NFS server, which is not brought
down by
your device driver.

Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی