Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Linux Device Drivers (3rd Edition) [Electronic resources] - نسخه متنی

Jonathan Corbet, Greg Kroah-Hartman, Alessandro Rubini

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید








4.3. Debugging by Querying


The previous section
described how
printk works and how it can be used. What it
didn't talk about are its disadvantages.

A massive use of printk can
slow
down the system noticeably, even if you lower
console_loglevel to avoid loading the console
device, because syslogd keeps syncing its output
files; thus, every line that is printed causes a disk operation. This
is the right implementation from syslogd
's perspective. It tries to write everything to disk
in case the system crashes right after printing the message; however,
you don't want to slow down your system just for the
sake of debugging messages. This problem can be solved by prefixing
the name of your log file as it appears in
/etc/syslogd.conf with a hyphen.[2] The problem with changing
the configuration file is that the modification will likely remain
there after you are done debugging, even though during normal system
operation you do want messages to be flushed to disk as soon as
possible. An alternative to such a permanent change is running a
program other than klogd (such as cat
/proc/kmsg
, as suggested earlier), but this may not
provide a suitable environment for normal system operation.

[2] The hyphen, or minus sign, is a
"magic" marker to prevent
syslogd from flushing the file to disk at every
new message, documented in syslog.conf(5), a
manpage worth reading.


More often than not, the best way to get relevant information is to
query the system when you need the information, instead of
continually producing data. In fact, every Unix system provides many
tools for obtaining system information: ps,
netstat, vmstat, and so on.

A few techniques are available to driver developers for querying the
system: creating a file in the /proc filesystem,
using the ioctl driver method, and exporting
attributes via sysfs. The use of
sysfs requires quite some background on the
driver model. It is discussed in Chapter 14.


4.3.1. Using the /proc Filesystem


The /proc filesystem is a special,
software-created filesystem that is used by the kernel to export
information to the world. Each file under /proc
is tied to a kernel function that generates the
file's "contents"
on the fly when the file is read. We have already seen some of these
files in action; /proc/modules, for example,
always returns a list of the currently loaded modules.

/proc is heavily used in the Linux system. Many
utilities on a modern Linux distribution, such as
ps, top, and
uptime, get their information from
/proc. Some device drivers also export
information via /proc, and yours can do so as
well. The /proc filesystem is dynamic, so your
module can add or remove entries at any time.

Fully featured /proc entries can be complicated
beasts; among other things, they can be written to as well as read
from. Most of the time, however, /proc entries
are read-only files. This section concerns itself with the simple
read-only case. Those who are interested in implementing something
more complicated can look here for the basics; the kernel source may
then be consulted for the full picture.

Before we continue, however, we should mention that adding files
under /proc is discouraged. The
/proc filesystem is seen by the kernel
developers as a bit of an uncontrolled mess that has gone far beyond
its original purpose (which was to provide information about the
processes running in the system). The recommended way of making
information available in new code is via sysfs. As suggested, working
with sysfs requires an understanding of the Linux device model,
however, and we do not get to that until Chapter 14. Meanwhile, files under
/proc are slightly easier to create, and they
are entirely suitable for debugging purposes, so we cover them here.


4.3.1.1 Implementing files in /proc

All modules that work with /proc
should include
<linux/proc_fs.h> to define the proper
functions.

To create a read-only /proc
file, your driver must implement a
function to produce the data when the file is read. When some process
reads the file (using the read system call), the
request reaches your module by means of this function.
We'll look at this function first and get to the
registration interface later in this section.

When a process reads from your /proc file, the
kernel allocates a page of memory (i.e., PAGE_SIZE
bytes) where the driver can write data to be returned to user space.
That buffer is passed to your function, which is a method called
read_proc:

int (*read_proc)(char *page, char **start, off_t offset, int count, 
int *eof, void *data);

The page pointer is the buffer where
you'll write your data; start is
used by the function to say where the interesting data has been
written in page (more on this later);
offset and count have the same
meaning as for the read method. The
eof argument points to an integer that must be set
by the driver to signal that it has no more data to return, while
data is a driver-specific data pointer you can use
for internal bookkeeping.

This function should return the number of bytes of data actually
placed in the page buffer, just like the
read method does for other files. Other output
values are *eof and *start.
eof is a simple flag, but the use of the
start value is somewhat more complicated; its
purpose is to help with the implementation of large (greater than one
page) /proc files.

The start parameter has a somewhat unconventional
use. Its purpose is to indicate where (within
page) the data to be returned to the user is
found. When your
proc_read

method is called, *start will be
NULL. If you leave it NULL, the
kernel assumes that the data has been put into
page as if offset were
0; in other words, it assumes a simple-minded
version of proc_read, which places the entire
contents of the virtual file in page without
paying attention to the offset parameter. If,
instead, you set *start to a
non-NULL value, the kernel assumes that the data
pointed to by *start takes
offset into account and is ready to be returned
directly to the user. In general, simple
proc_read methods that return tiny amounts of
data just ignore start. More complex methods set
*start to page and only place
data beginning at the requested offset there.

There has long been another major issue
with /proc files, which start
is meant to solve as well. Sometimes the ASCII representation of
kernel data structures changes between successive calls to
read, so the reader process could find
inconsistent data from one call to the next. If
*start is set to a small integer value, the caller
uses it to increment filp->f_pos independently
of the amount of data you return, thus making
f_pos an internal record number of your
read_proc procedure. If, for example, your
read_proc

function is returning information from a big array of structures, and
five of those structures were returned in the first call,
*start could be set to 5. The
next call provides that same value as the offset; the driver then
knows to start returning data from the sixth structure in the array.
This is acknowledged as a "hack" by
its authors and can be seen in
fs/proc/generic.c.

Note that there is a better way to implement large
/proc files; it's called
seq_file, and we'll discuss it
shortly. First, though, it is time for an example. Here is a simple
(if somewhat ugly) read_proc implementation for
the scull device:

int scull_read_procmem(char *buf, char **start, off_t offset,
int count, int *eof, void *data)
{
int i, j, len = 0;
int limit = count - 80; /* Don't print more than this */
for (i = 0; i < scull_nr_devs && len <= limit; i++) {
struct scull_dev *d = &scull_devices[i];
struct scull_qset *qs = d->data;
if (down_interruptible(&d->sem))
return -ERESTARTSYS;
len += sprintf(buf+len,"\nDevice %i: qset %i, q %i, sz %li\n",
i, d->qset, d->quantum, d->size);
for (; qs && len <= limit; qs = qs->next) { /* scan the list */
len += sprintf(buf + len, " item at %p, qset at %p\n",
qs, qs->data);
if (qs->data && !qs->next) /* dump only the last item */
for (j = 0; j < d->qset; j++) {
if (qs->data[j])
len += sprintf(buf + len,
" % 4i: %8p\n",
j, qs->data[j]);
}
}
up(&scull_devices[i].sem);
}
*eof = 1;
return len;
}

This is a fairly typical read_proc
implementation. It assumes that there will never be a need to
generate more than one page of data and so ignores the
start and offset values. It is,
however, careful not to overrun its buffer, just in case.


4.3.1.2 An older interface

If you read through the kernel source,
you may encounter code
implementing /proc files with an older
interface:

int (*get_info)(char *page, char **start, off_t offset, int count);

All of the arguments have the same meaning as they do for
read_proc, but the eof and
data arguments are missing. This interface is
still supported, but it could go away in the future; new code should
use the read_proc interface instead.


4.3.1.3 Creating your /proc file

Once you have a read_proc
function defined,

you need to connect it to
an entry in the /proc hierarchy. This is

done with
a call to
create_proc_read_entry
:

struct proc_dir_entry *create_proc_read_entry(const char *name,
mode_t mode, struct proc_dir_entry *base,
read_proc_t *read_proc, void *data);

Here, name is the name of the file to create,
mode is the protection mask for the file (it can
be passed as 0 for a system-wide default), base
indicates the directory in which the file should be created (if
base is NULL, the file is
created in the /proc root),
read_proc is the read_proc
function that implements the file, and data is
ignored by the kernel (but passed to read_proc).
Here is the call used by scull to make its
/proc function available as
/proc/scullmem:

create_proc_read_entry("scullmem", 0 /* default mode */,
NULL /* parent dir */, scull_read_procmem,
NULL /* client data */);

Here, we create a file called scullmem directly
under /proc, with the default, world-readable
protections.

The directory entry pointer can be used to
create entire directory hierarchies
under /proc. Note, however, that an entry may be
more easily placed in a subdirectory of /proc
simply by giving the directory name as part of the name of the
entryas long as the directory itself already exists. For
example, an (often ignored) convention says that
/proc entries associated with device drivers
should go in the subdirectory driver/;
scull could place its entry there simply by
giving its name as driver/scullmem.


Entries in /proc,
of course, should be removed when the module is unloaded.
remove_proc_entry is the function that undoes
what create_proc_read_entry already did:

remove_proc_entry("scullmem", NULL /* parent dir */);

Failure to remove entries can result in calls at unwanted times, or,
if your module has been unloaded, kernel crashes.

When using /proc files as shown, you must
remember a few nuisances of the implementationno surprise its
use is discouraged nowadays.

The most important problem is with
removal of
/proc entries. Such removal may well happen
while the file is in use, as there is no owner associated to
/proc entries, so using them
doesn't act on the module's
reference count. This problem is simply triggered by running
sleep 100 < /proc/myfile just before removing
the module, for example.

Another issue is about registering two entries with the same name.
The kernel trusts the driver and doesn't check if
the name is already registered, so if you are not careful you might
end up with two or more entries with the same name. This is a problem
known to happen in classrooms, and such entries are
indistinguishable, both when you access them and when you call
remove_proc_entry.


4.3.1.4 The seq_file interface

As we noted above, the
implementation of large files under
/proc is a little awkward. Over time,
/proc methods have become notorious for buggy
implementations when the amount of output grows large. As a way of
cleaning up the /proc code and making life
easier for kernel programmers, the seq_file
interface was added. This interface provides a simple set of
functions for the implementation of large kernel virtual files.

The seq_file interface assumes that you are
creating a virtual file that steps through a sequence of items that
must be returned to user space. To use seq_file,
you must create a simple "iterator"
object that can establish a position within the sequence, step
forward, and output one item in the sequence. It may sound
complicated, but, in fact, the process is quite simple.
We'll step through the creation of a
/proc file in the scull
driver to show how it is done.

The first step, inevitably, is the inclusion of
<linux/seq_file.h>. Then you must create
four iterator methods, called start,
next, stop, and
show.

The start

method is always called first. The prototype for this function is:

void *start(struct seq_file *sfile, loff_t *pos);

The sfile

argument can almost always be ignored. pos is an
integer position indicating where the reading should start. The
interpretation of the position is entirely up to the implementation;
it need not be a byte position in the resulting file. Since
seq_file implementations typically step through a
sequence of interesting items, the position is often interpreted as a
cursor pointing to the next item in the sequence. The
scull driver interprets each device as one item
in the sequence, so the incoming pos is simply an
index into the scull_devices array. Thus, the
start method used in scull
is:

static void *scull_seq_start(struct seq_file *s, loff_t *pos)
{
if (*pos >= scull_nr_devs)
return NULL; /* No more to read */
return scull_devices + *pos;
}

The return value, if non-NULL, is a private value
that can be used by the iterator implementation.

The next

function should move the iterator to
the next position, returning NULL if there is
nothing left in the sequence. This method's
prototype is:

void *next(struct seq_file *sfile, void *v, loff_t *pos);

Here, v is the iterator as returned from the
previous call to start or
next, and pos is the current
position in the file. next should increment the
value pointed to by pos; depending on how your
iterator works, you might (though probably won't)
want to increment pos by more than one.
Here's what scull does:

static void *scull_seq_next(struct seq_file *s, void *v, loff_t *pos)
{
(*pos)++;
if (*pos >= scull_nr_devs)
return NULL;
return scull_devices + *pos;
}

When the kernel is done with the iterator, it
calls

stop to clean up:

void stop(struct seq_file *sfile, void *v);

The scull implementation has no cleanup work to
do, so its stop method is empty.

It is worth noting that the seq_file code, by
design, does not sleep or perform other nonatomic tasks between the
calls to start and stop.
You are also guaranteed to see one stop call
sometime shortly after a call to start.
Therefore, it is safe for your start method to
acquire semaphores or spinlocks. As long as your other
seq_file methods are atomic, the whole sequence of
calls is atomic. (If this paragraph does not make sense to you, come
back to it after you've read the next chapter.)

In between these calls, the kernel calls the

show method to
actually output something interesting to the user space. This
method's prototype is:

int show(struct seq_file *sfile, void *v);

This method should create output for the item in the sequence
indicated by the iterator v. It should not
use

printk, however; instead, there is a special set
of functions for seq_file output:

int seq_printf(struct seq_file *sfile, const char *fmt, ...);


This is the printf equivalent for
seq_file implementations; it takes the usual
format string and additional value arguments. You must also pass it
the seq_file structure given to the
show function, however. If
seq_printf returns a nonzero value, it means
that the buffer has filled, and output is being discarded. Most
implementations ignore the return value, however.


int seq_putc(struct seq_file *sfile, char c);

int seq_puts(struct seq_file *sfile, const char *s);


These are the equivalents of the user-space putc
and puts functions.


int seq_escape(struct seq_file *m, const char *s, const char *esc);


This function is equivalent to seq_puts with the
exception that any character in s that is also
found in esc is printed in octal format. A common
value for esc is " \t\n\\",
which keeps embedded white space from messing up the output and
possibly confusing shell scripts.


int seq_path(struct seq_file *sfile, struct vfsmount *m, struct dentry

*dentry, char *esc);


This function can be used for outputting the file name associated
with a given directory entry. It is unlikely to be useful in device
drivers; we have included it here for completeness.



Getting back to our example; the show method
used in scull is:

static int scull_seq_show(struct seq_file *s, void *v)
{
struct scull_dev *dev = (struct scull_dev *) v;
struct scull_qset *d;
int i;
if (down_interruptible(&dev->sem))
return -ERESTARTSYS;
seq_printf(s, "\nDevice %i: qset %i, q %i, sz %li\n",
(int) (dev - scull_devices), dev->qset,
dev->quantum, dev->size);
for (d = dev->data; d; d = d->next) { /* scan the list */
seq_printf(s, " item at %p, qset at %p\n", d, d->data);
if (d->data && !d->next) /* dump only the last item */
for (i = 0; i < dev->qset; i++) {
if (d->data[i])
seq_printf(s, " % 4i: %8p\n",
i, d->data[i]);
}
}
up(&dev->sem);
return 0;
}

Here, we finally interpret our
"iterator" value, which is simply a
pointer to a scull_dev structure.

Now that it has a full set of iterator operations,
scull must package them up and connect them to a
file in /proc. The first step is done by filling
in a seq_operations structure:

static struct seq_operations scull_seq_ops = {
.start = scull_seq_start,
.next = scull_seq_next,
.stop = scull_seq_stop,
.show = scull_seq_show
};

With that structure in place, we must create a file implementation
that the kernel understands. We do not use the
read_proc method described previously; when
using seq_file, it is best to connect in to
/proc at a slightly lower level. That means
creating a file_operations structure (yes, the
same structure used for char drivers) implementing all of the
operations needed by the kernel to handle reads and seeks on the
file. Fortunately, this task is straightforward. The first step is to
create an open method that connects the file to
the seq_file operations:

static int scull_proc_open(struct inode *inode, struct file *file)
{
return seq_open(file, &scull_seq_ops);
}

The call to seq_open connects the
file structure with our sequence operations
defined above. As it turns out, open is the only
file operation we must implement ourselves, so we can now set up our
file_operations structure:

static struct file_operations scull_proc_ops = {
.owner = THIS_MODULE,
.open = scull_proc_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release
};

Here we specify our own open method, but use the
canned methods seq_read,
seq_lseek, and seq_release
for everything else.

The final step is to create the actual file in
/proc:

entry = create_proc_entry("scullseq", 0, NULL);
if (entry)
entry->proc_fops = &scull_proc_ops;

Rather than using create_proc_read_entry, we
call the lower-level create_proc_entry, which
has this prototype:

struct proc_dir_entry *create_proc_entry(const char *name,
mode_t mode,
struct proc_dir_entry *parent);

The arguments are the same as their equivalents in
create_proc_read_entry: the name of the file,
its protections, and the parent directory.

With the above code, scull has a new
/proc entry that looks much like the previous
one. It is superior, however, because it works regardless of how
large its output becomes, it handles seeks properly, and it is
generally easier to read and maintain. We recommend the use of
seq_file for the implementation of files that
contain more than a very small number of lines of output.


4.3.2. The ioctl Method


ioctl, which we show you
how

Chapter 6, is a system call
that acts on a file descriptor; it receives a number that identifies
a command to be performed and (optionally) another argument, usually
a pointer. As an alternative to using the /proc
filesystem, you can implement a few ioctl
commands tailored for debugging. These commands can copy relevant
data structures from the driver to user space where you can examine
them.

Using ioctl this way to get information is
somewhat more difficult than using /proc,
because you need another program to issue the
ioctl and display the results. This program must
be written, compiled, and kept in sync with the module
you're testing. On the other hand, the driver-side
code can be easier than what is needed to implement a
/proc file.

There are times when ioctl is the best way to
get information, because it runs faster than reading
/proc. If some work must be performed on the
data before it's written to the screen, retrieving
the data in binary form is more efficient than reading a text file.
In addition, ioctl doesn't
require splitting data into fragments smaller than a page.

Another interesting advantage of the
ioctl approach is that information-retrieval
commands can be left in the driver even when debugging would
otherwise be disabled. Unlike a /proc file,
which is visible to anyone who looks in the directory (and too many
people are likely to wonder "what that strange file
is"), undocumented ioctl
commands are likely to remain unnoticed. In addition, they will still
be there should something weird happen to the driver. The
only drawback is that the module will be slightly bigger.


    / 202