Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید

8.27. Program: Flat File Indexes


It sometimes happens that you
need to jump directly to a particular line number in a file, but the
lines vary in length, so you can't use Recipe 8.12. Although you could start at the beginning of
the file and read every line, this is inefficient if you're making
multiple queries.

The solution is to build an index of fixed-width records, one per
line. Each record contains the offset in the data file of the
corresponding line. The subroutine in Example 8-10
takes the data file and a filehandle to send the index to. It reads a
record at a time and prints the current offset in the file to the
index, packed into a big-ending unsigned 32-bit integer; see the
documentation for the pack function in
perlfunc(1) for alternative storage types.

Example 8-10. build_index


  # usage: build_index(*DATA_HANDLE, *INDEX_HANDLE)
sub build_index {
my $data_file = shift;
my $index_file = shift;
my $offset = 0;
while (<$data_file>) {
print $index_file pack("N", $offset);
$offset = tell($data_file);
}
}

Once you have an index, it becomes easy to read a particular line
from the data file. Jump to that record in the index, read the
offset, and jump to that position in the data file. The next line you
read will be the one you want. Example 8-11 returns
the line, given the line number and the index and data file handles.

Example 8-11. line_with_index


  # usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER)
# returns line or undef if LINE_NUMBER was out of range
sub line_with_index {
my $data_file = shift;
my $index_file = shift;
my $line_number = shift;
my $size; # size of an index entry
my $i_offset; # offset into the index of the entry
my $entry; # index entry
my $d_offset; # offset into the data file
$size = length(pack("N", 0));
$i_offset = $size * ($line_number-1);
seek($index_file, $i_offset, 0) or return;
read($index_file, $entry, $size);
$d_offset = unpack("N", $entry);
seek($data_file, $d_offset, 0);
return scalar(<$data_file>);
}

To use these subroutines, just say:

open($fh,  "<", $file)         or die "Can't open $file for reading: $!\n";
open($index, "+>", $file.idx)
or die "Can't open $file.idx for read/write: $!\n";
build_index($fh, $index);
$line = line_with_index($file, $index, $seeking);

The next step is to cache the index file between runs of the program,
so you're not building it each time. This is shown in Example Recipe 8.12. Then add locking for concurrent access, and
check time stamps on the files to see whether a change to the data
file has made an old index file out of date.

Example 8-12. cache_line_index


  #!/usr/bin/perl -w
# cache_line_index - index style
# build_index and line_with_index from above
@ARGV = = 2 or
die "usage: print_line FILENAME LINE_NUMBER";
($filename, $line_number) = @ARGV;
open(my $orig, "<", $filename)
or die "Can't open $filename for reading: $!";
# open the index and build it if necessary
# there's a race condition here: two copies of this
# program can notice there's no index for the file and
# try to build one. This would be easily solved with
# locking
$indexname = "$filename.index";
sysopen(my $idx, $indexname, O_CREAT|O_RDWR)
or die "Can't open $indexname for read/write: $!";
build_index($orig, $idx) if -z $indexname; # XXX: race unless lock
$line = line_with_index($orig, $idx, $line_number);
die "Didn't find line $line_number in $filename" unless defined $line;
print $line;


/ 875