Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

7.1. Measuring Time Lapses

The kernel keeps track of the flow of time by means of timer
interrupts. Interrupts are covered in detail in Chapter 10.

Timer interrupts are generated by

the
system's timing hardware at regular intervals; this
interval is programmed at boot time by the kernel according to the
value of HZ, which is an architecture-dependent
value defined in <linux/param.h> or a
subplatform file included by it. Default values in the distributed
kernel source range from 50 to 1200 ticks per second on real
hardware, down to 24 for software simulators. Most platforms run at
100 or 1000 interrupts per second; the popular x86 PC defaults to
1000, although it used to be 100 in previous versions (up to and
including 2.4). As a general rule, even if you know the value of
HZ, you should never count on that specific value
when programming.

It is possible to change the value of HZ for those
who want systems with a different clock interrupt frequency. If you
change HZ in the header file, you need to
recompile the kernel and all modules with the new value. You might
want to raise HZ to get a more fine-grained
resolution in your asynchronous tasks, if you are willing to pay the
overhead of the extra timer interrupts to achieve your goals.
Actually, raising HZ to 1000 was pretty common
with x86 industrial systems using Version 2.4 or 2.2 of the kernel.
With current versions, however, the best approach to the timer
interrupt is to keep the default value for HZ, by
virtue of our complete trust in the kernel developers, who have
certainly chosen the best value. Besides, some internal calculations
are currently implemented only for HZ in the range
from 12 to 1535 (see <linux/timex.h> and
RFC-1589).

Every time a timer interrupt occurs,
the value of an internal kernel counter is incremented. The counter
is initialized to 0 at system boot, so it
represents the number of clock ticks since last boot. The counter is
a 64-bit variable (even on 32-bit architectures) and is called
jiffies_64. However, driver writers normally
access the jiffies variable, an
unsigned long that is the same
as either jiffies_64 or its least significant
bits. Using jiffies is usually preferred because
it is faster, and accesses to the 64-bit
jiffies_64 value are not necessarily atomic on all
architectures.

In addition to the low-resolution kernel-managed jiffy mechanism,
some CPU platforms feature a high-resolution counter that software
can read. Although its actual use varies somewhat across platforms,
it's sometimes a very powerful tool.

7.1.1. Using the jiffies Counter

The counter and the utility functions to read it live in
<linux/jiffies.h>, although
you'll usually just include
<linux/sched.h>, that automatically pulls
jiffies.h in. Needless to say, both
jiffies and jiffies_64 must be
considered read-only.

Whenever your code needs to remember the current value of
jiffies, it can simply access the
unsigned long variable, which
is declared as volatile to tell the compiler not to optimize memory
reads. You need to read the current counter whenever your code needs
to calculate a future time stamp, as shown in the following example:

#include <linux/jiffies.h>
unsigned long j, stamp_1, stamp_half, stamp_n;
j = jiffies;                      /* read the current value */
stamp_1    = j + HZ;              /* 1 second in the future */
stamp_half = j + HZ/2;            /* half a second */
stamp_n    = j + n * HZ / 1000;   /* n milliseconds */

This code has no problem with jiffies wrapping
around, as long as different values are compared in the right way.
Even though on 32-bit platforms the counter wraps around only once
every 50 days when HZ is 1000, your code should be
prepared to face that event. To compare your cached value (like
stamp_1 above) and the current value, you should
use one of the following macros:

#include <linux/jiffies.h>
int time_after(unsigned long a, unsigned long b);
int time_before(unsigned long a, unsigned long b);
int time_after_eq(unsigned long a, unsigned long b);
int time_before_eq(unsigned long a, unsigned long b);

The first evaluates true when a, as a snapshot
of jiffies, represents a time after
b, the second evaluates true when time
a is before time b, and the
last two compare for "after or
equal" and "before or
equal." The code works by converting the values to
signed long, subtracting them, and comparing the result. If you need
to know the difference between two instances of
jiffies in a safe way, you can use the same trick:
diff = (long)t2 - (long)t1;.

You can convert a jiffies difference to milliseconds trivially
through:

msec = diff * 1000 / HZ;

Sometimes, however, you need to exchange time representations with
user space programs that tend to represent time values with
struct timeval and
struct timespec. The two
structures represent a precise time quantity with two numbers:
seconds and microseconds are used in the older and popular
struct timeval, and seconds and
nanoseconds are used in the newer struct
timespec. The kernel exports four helper functions
to convert time values expressed as jiffies to and from those
structures:

#include <linux/time.h>
unsigned long timespec_to_jiffies(struct timespec *value);
void jiffies_to_timespec(unsigned long jiffies, struct timespec *value);
unsigned long timeval_to_jiffies(struct timeval *value);
void jiffies_to_timeval(unsigned long jiffies, struct timeval *value);

Accessing the 64-bit jiffy count is not as straightforward as
accessing jiffies. While on 64-bit computer
architectures the two variables are actually one, access to the value
is not atomic for 32-bit processors. This means you might read the
wrong value if both halves of the variable get updated while you are
reading them. It's extremely unlikely
you'll ever need to read the 64-bit counter, but in
case you do, you'll be glad to know that the kernel
exports a specific helper function that does the proper locking for
you:

#include <linux/jiffies.h>
u64 get_jiffies_64(void);

In the above prototype, the u64 type is used. This
is one of the types defined by
<linux/types.h>
and represents an
unsigned 64-bit type.

If you're wondering how 32-bit platforms update both
the 32-bit and 64-bit counters at the same time, read the linker
script for your platform (look for a file whose name matches
vmlinux*.lds*). There, the
jiffies symbol is defined to access the least
significant word of the 64-bit value, according to whether the
platform is little-endian or big-endian. Actually, the same trick is
used for 64-bit platforms, so that the unsigned long and u64 variables are accessed at
the same address.

Finally, note that the actual clock frequency is almost completely
hidden from user space. The macro HZ always
expands to 100 when user-space programs include
param.h, and every counter reported to user
space is converted accordingly. This applies to
clock(3), times(2), and any
related function. The only evidence available to users of the
HZ value is how fast timer interrupts happen, as
shown in /proc/interrupts. For example, you can
obtain HZ by dividing this count by the system
uptime reported in /proc/uptime.

7.1.2. Processor-Specific Registers

If you need to measure
very
short time intervals or you need extremely high precision in your
figures, you can resort to platform-dependent resources, a choice of
precision over portability.

In modern processors, the pressing
demand for empirical performance figures is thwarted by the intrinsic
unpredictability of instruction timing in most CPU designs due to
cache memories, instruction scheduling, and branch prediction. As a
response, CPU manufacturers introduced a way to count clock cycles as
an easy and reliable way to measure time lapses. Therefore, most
modern processors include a counter register that is steadily
incremented once at each clock cycle. Nowadays, this clock counter is
the only reliable way to carry out high-resolution timekeeping tasks.

The details differ from platform to platform: the register may or may
not be readable from user space, it may or may not be writable, and
it may be 64 or 32 bits wide. In the last case, you must be prepared
to handle overflows just like we did with the jiffy counter. The
register may even not exist for your platform, or it can be
implemented in an external device by the hardware designer, if the
CPU lacks the feature and you are dealing with a special-purpose
computer.

Whether or not the register can be zeroed, we strongly discourage
resetting it, even when hardware permits. You might not, after all,
be the only user of the counter at any given time; on some platforms
supporting SMP, for example, the kernel depends on such a counter to
be synchronized across processors. Since you can always measure
differences between values, as long as that difference
doesn't exceed the overflow time, you can get the
work done without claiming exclusive ownership of the register by
modifying its current value.

The most renowned counter register
is the TSC (timestamp counter), introduced in x86 processors with the
Pentium and present in all CPU designs ever sinceincluding the
x86_64 platform. It is a 64-bit register that counts CPU clock
cycles; it can be read from both kernel space and user space.

After including <asm/msr.h> (an
x86-specific header whose name stands for
"machine-specific registers"), you
can use one of these macros:

rdtsc(low32,high32);
rdtscl(low32);
rdtscll(var64);

The first macro atomically reads the 64-bit value into two 32-bit
variables; the next one ("read low
half") reads the low half of the register into a
32-bit variable, discarding the high half; the last reads the 64-bit
value into a long long variable, hence, the name.
All of these macros store values into their arguments.

Reading the low half of the counter is enough for most common uses of
the TSC. A 1-GHz CPU overflows it only once every 4.2 seconds, so you
won't need to deal with multiregister variables if
the time lapse you are benchmarking reliably takes less time.
However, as CPU frequencies rise over time and as timing requirements
increase, you'll most likely need to read the 64-bit
counter more often in the future.

As an example using only the low half of the register, the following
lines measure the execution of the instruction itself:

unsigned long ini, end;
rdtscl(ini); rdtscl(end);
printk("time lapse: %li\n", end - ini);

Some
of the other platforms offer similar functionality, and kernel
headers offer an architecture-independent function that you can use
instead of rdtsc. It is called
get_cycles, defined in
<asm/timex.h> (included by
<linux/timex.h>). Its prototype is:

 #include <linux/timex.h>
cycles_t get_cycles(void);

This function
is defined for every platform, and it always returns
0 on the platforms that have no cycle-counter
register. The cycles_t type is an appropriate
unsigned type to hold the value read.

Despite the availability of
an architecture-independent function, we'd like to
take the opportunity to show an example of inline assembly code. To
this aim, we implement a
rdtscl

function for MIPS processors that works in the same way as the x86
one.

We base the example on MIPS because most MIPS processors feature a
32-bit counter as register 9 of their internal
"coprocessor 0." To access the
register, readable only from kernel space, you can define the
following macro that executes a "move from
coprocessor 0" assembly instruction:^[1]

^[1] The trailing nop instruction is required
to prevent the compiler from accessing the target register in the
instruction immediately following mfc0. This
kind of interlock is typical of RISC processors, and the compiler can
still schedule useful instructions in the delay slots. In this case,
we use nop because inline assembly is a black
box for the compiler and no optimization can be performed.

#define rdtscl(dest) _ _asm_ _ _ _volatile_ _("mfc0 %0,$9; nop" : "=r" (dest))

With this macro in place, the MIPS processor can execute the same
code shown earlier for the x86.

With
gcc inline assembly, the allocation of
general-purpose registers is left to the compiler. The macro just
shown uses %0 as a placeholder for
"argument 0," which is later
specified as "any register (r)
used as output (=)." The macro
also states that the output register must correspond to the C
expression dest. The syntax for inline assembly is
very powerful but somewhat complex, especially for architectures that
have constraints on what each register can do (namely, the x86
family). The syntax is described in the gcc
documentation, usually available in the info
documentation tree.

The short C-code fragment shown in this section has been run on a
K7-class x86 processor and a MIPS VR4181 (using the macro just
described). The former reported a time lapse of 11 clock ticks and
the latter just 2 clock ticks. The small figure was expected, since
RISC processors usually execute one instruction per clock cycle.

There is one other thing worth knowing about timestamp counters: they
are not necessarily synchronized across processors in an SMP system.
To be sure of getting a coherent value, you should disable preemption

for code
that is querying the counter.

Linux Device Drivers (3rd Edition) [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی