C.Plus.Plus.Primer.4th.Edition [Electronic resources] نسخه متنی

A.3. The IO Library Revisited

Chapter 8 we introduced the basic architecture and most commonly used parts of the IO library. This Appendix completes our coverage of the IO library.

A.3.1. Format State

In addition to a condition state (Section 8.2, p. 287), each iostream object also maintains a format state that controls the details of how IO is formatted. The format state controls aspects of formatting such as the notational base for an integral value, the precision of a floating-point value, the width of an output element, and so on. The library also defines a set of manipulators (listed in Tables A.2 (p. 829) and A.3 (p. 833) for modifying the format state of an object. Simply speaking, a manipulator is a function or object that can be used as an operand to an input or output operator. A manipulator returns the stream object to which it is applied, so we can output multiple manipulators and data in a single statement.

When we read or write a manipulator, no data are read or written. Instead, an action is taken. Our programs have already used one manipulator, endl, which we "write" to an output stream as if it were a value. But endl isn''t a value; instead, it performs an operation: It writes a newline and flushes the buffer.

A.3.2. Many Manipulators Change the Format State

Many manipulators change the format state of the stream. They change the format of how floating-pointer numbers are printed or whether a bool is displayed as a numeric value or using the bool literals, TRue or false, and so forth.

Manipulators that change the format state of the stream usually leave the format state changed for all subsequent IO.

Most of the manipulators that change the format state provide set/unset pairs; one manipulator sets the format state to a new value and the other unsets it, restoring the normal default formatting.

The fact that a manipulator makes a persistent change to the format state can be useful when we have a set of IO operations that want to use the same formatting. Indeed, some programs take advantage of this aspect of manipulators to reset the behavior of one or more formatting rules for all its input or output. In such cases, the fact that a manipulator changes the stream is a desirable property.

However, many programs (and, more importantly, programmers) expect the state of the stream to match the normal library defaults. In these cases, leaving the state of the stream in a nonstandard state can lead to errors.

It is usually best to undo any state change made by a manipulator. Ordinarily, a stream should be in its ordinary, default state after every IO operation.

Using flags Operation to Restore the Format State

An even better approach to managing changes to format state uses the flags operations. The flags operations are similar to the rdstate and setstate operations that manage the condition state of the stream. In this case, the library defines a pair of flags functions:

flags() with no arguments returns the stream''s current format state. The value returned is a library defined type named fmtflags.

flags(arg) takes a fmtflags argument and sets the stream''s format as indicated by the argument.

We can use these functions to remember and restore the format state of either an input or output stream:


void display(ostream& os)
{
// remember the current format state
ostream::fmtflags curr_fmt = os.flags();
// do output that uses manipulators that change the format state of os
os.flags(curr_fmt);              // restore the original format state of os
}

A.3.3. Controlling Output Formats

Many of the manipulators allow us to change the appearance of our output. There are two broad categories of output control: controlling the presentation of numeric values and controlling the amount and placment of padding.

Controlling the Format of Boolean Values

One example of a manipulator that changes the formatting state of its object is the boolalpha manipulator. By default, bool values print as 1 or 0. A true value is written as the integer 1 and a false value as 0. We can override this formatting by applying the boolalpha manipulator to the stream:


cout << "default bool values: "
<< true << " " << false
<< "\nalpha bool values: "
<< boolalpha
<< true << " " << false
<< endl;

When executed, the program generates the following:


default bool values: 1 0
alpha bool values: true false

Once we "write" boolalpha on cout, we''ve changed how cout will print bool values from this point on. Subsequent operations that print bools will print them as either true or false.

To undo the format state change to cout, we must apply noboolalpha:


bool bool_val;
cout << boolalpha    // sets internal state of cout
<< bool_val
<< noboolalpha; // resets internal state to default formatting

Now we change the formatting of bool values only to print of bool_val and immediately reset the stream back to its initial state.

Specifying the Base for Integral Values

By default, integral values are written and read in decimal notation. The programmer can change the notational base to octal or hexadecimal or back to decimal (the representation of floating-point values is unaffected) by using the manipulators hex, oct, and dec:


const int ival = 15, jval = 1024; // const, so values never change
cout << "default: ival = " << ival
<< " jval = " << jval << endl;
cout << "printed in octal: ival = " << oct << ival
<< " jval = " << jval << endl;
cout << "printed in hexadecimal: ival = " << hex << ival
<< " jval = " << jval << endl;
cout << "printed in decimal: ival = " << dec << ival
<< " jval = " << jval << endl;

When compiled and executed, the program generates the following output:


default: ival = 15 jval = 1024
printed in octal: ival = 17 jval = 2000
printed in hexadecimal: ival = f jval = 400
printed in decimal: ival = 15 jval = 1024

Notice that like boolalpha, these manipulators change the format state. They affect the immediately following output, and all subsequent integral output, until the format is reset by invoking another manipulator.

Indicating Base on the Output

By default, when we print numbers, there is no visual cue as to what notational base was used. Is 20, for example, really 20, or an octal representation of 16? When printing numbers in decimal mode, the number is printed as we expect. If we need to print octal or hexadecimal values, it is likely that we should also use the showbase manipulator. The showbase manipulator causes the output stream to use the same conventions as used for specifying the base of an integral constant:

A leading 0x indicates hexadecimal

A leading 0 indicates octal

The absence of either indicates decimal

Here is the program revised to use showbase:


const int ival = 15, jval = 1024; // const so values never change
cout << showbase; // show base when printing integral values
cout << "default: ival = " << ival
<< " jval = " << jval << endl;
cout << "printed in octal: ival = " << oct << ival
<< " jval = " << jval << endl;
cout << "printed in hexadecimal: ival = " << hex << ival
<< " jval = " << jval << endl;
cout << "printed in decimal: ival = " << dec << ival
<< " jval = " << jval << endl;
cout << noshowbase; // reset state of the stream

The revised output makes it clear what the underlying value really is:


default: ival = 15 jval = 1024
printed in octal: ival = 017 jval = 02000
printed in hexadecimal: ival = 0xf jval = 0x400
printed in decimal: ival = 15 jval = 1024

The noshowbase manipulator resets cout so that it no longer displays the notational base of integral values.

By default, hexadecimal values are printed in lowercase with a lowercase x. We could display the X and the hex digits af as uppercase by applying the uppercase manipulator.


cout << uppercase << showbase << hex
<< "printed in hexadecimal: ival = " << ival
<< " jval = " << jval << endl
<< nouppercase << endl;

The preceding program generates the following output:


printed in hexadecimal: ival = 0XF jval = 0X400

To revert back to the lowercase x, we apply the nouppercase manipulator.

Controlling the Format of Floating-Point Values

There are three aspects of formatting floating-point values that we can control:

Precision: how many digits are printed

Notation: whether to print in decimal or scientific notation

Handling of the decimal point for floating-point values that are whole numbers

By default, floating-point values are printed using six digits of precision. If the value has no fractional part, then the decimal point is omitted. Whether the number is printed using decimal or scientific notation depends on the value of the floating-point number being printed. The library chooses a format that enhances readability of the number. Very large and very small values are printed using scientific notation. Other values use fixed decimal.

Specifying How Much Precision to Print

Section 7.8, p. 265): One version takes an int value and sets the precision to that new value. It returns the previous precision value. The other version takes no arguments and returns the current precision value. The setprecision manipulator takes an argument, which it uses to set the precision.

Table A.2. Manipulators Defined in iostream

boolalpha

Display true and false as strings

noboolalpha

Display true and false as 0, 1

showbase

Generate prefix indicating numeric base

noshowbase

Do not generate notational base prefix

showpoint

Always display decimal point

noshowpoint

Only display decimal point if fraction

showpos

Display + in nonnegative numbers

noshowpos

Do not display + in nonnegative numbers

uppercase

Print 0X in hexadecimal, E in scientific

nouppercase

Print 0x in hexadecimal, e in scientific

dec

Display in decimal numeric base

hex

Display in hexadecimal numeric base

oct

Display in octal numeric base

left

Add fill characters to right of value

right

Add fill characters to left of value

internal

Add fill characters between sign and value

fixed

Display floating-point in decimal notation

scientific

Display floating-point in scientific notation

flush

Flush ostream buffer

ends

Insert null, then flush ostream buffer

endl

Insert newline, then flush ostream buffer

unitbuf

Flush buffers after every output operation

nounitbuf

Restore normal buffer flushing

skipws

Skip whitespace with input operators

noskipws

Do not skip whitespace with input operators

"Eat" whitespace

x indicates default stream state

The following program illustrates the different ways we can control the precision use when printing floating point values:


// cout.precision reports current precision value
cout << "Precision: " << cout.precision()
<< ", Value: "   << sqrt(2.0) << endl;
// cout.precision(12) asks that 12 digits of precision to be printed
cout.precision(12);
cout << "Precision: " << cout.precision()
<< ", Value: "   << sqrt(2.0) << endl;
// alternative way to set precision using setprecision manipulator
cout << setprecision(3);
cout << "Precision: " << cout.precision()
<< ", Value: "   << sqrt(2.0) << endl;

When compiled and executed, the program generates the following output:


Precision: 6, Value: 1.41421
Precision: 12, Value: 1.41421356237
Precision: 3, Value: 1.41

This program calls the library sqrt function, which is found in the cmath header. The sqrt function is overloaded and can be called on either a float, double, or long double argument. It returns the square root of its argument.

The setprecision manipulators and other manipulators that take arguments are defined in the iomanip header.

Controlling the Notation

By default, the notation used to print floating-point values depends on the size of the number: If the number is either very large or very small, it will be printed in scientific notation; otherwise, fixed decimal is used. The library chooses the notation that makes the number easiest to read.

When printing a floating-point number as a plain number (as opposed to printing money, or a percentage, where we want to control the appearance of the value), it is usually best to let the library choose the notation to use. The one time to force either scientific or fixed decimal is when printing a table in which the decimal points should line up.

If we want to force either scientific or fixed notation, we can do so by using the appropriate manipulator: The scientific manipulator changes the stream to use scientific notation. As with printing the x on hexadecimal integral values, we can also control the case of the e in scientific mode through the uppercase manipulator. The fixed manipulator changes the stream to use fixed decimal.

These manipulators change the default meaning of the precision for the stream. After executing either scientific or fixed, the precision value controls the number of digits after the decimal point. By default, precision specifies the total number of digitsboth before and after the decimal point. Using fixed or scientific lets us print numbers lined up in columns. This strategy ensures that the decimal point is always in a fixed position relative to the fractional part being printed.

Reverting to Default Notation for Floating-Point Values

Unlike the other manipulators, there is no manipulator to return the stream to its default state in which it chooses a notation based on the value being printed. Instead, we must call the unsetf member to undo the change made by either scientific or fixed. To return the stream to default handling of float values we pass unsetf function a library-defined value named floatfield:


// reset to default handling for notation
cout.unsetf(ostream::floatfield);

Except for undoing their effect, using these manipulators is like using any other manipulator:


cout << sqrt(2.0) << ''\n'' << endl;
cout << "scientific: " << scientific << sqrt(2.0) << ''\n''
<< "fixed decimal: " << fixed << sqrt(2.0) << "\n\n";
cout << uppercase
<< "scientific: " << scientific << sqrt(2.0) << ''\n''
<< "fixed decimal: " << fixed << sqrt(2.0) << endl
<< nouppercase;
// reset to default handling for notation
cout.unsetf(ostream::floatfield);
cout << ''\n'' << sqrt(2.0) << endl;

produces the following output:


1.41421
scientific: 1.414214e+00
fixed decimal: 1.414214
scientific: 1.414214E+00
fixed decimal: 1.414214
1.41421

Printing the Decimal Point

By default, when the fractional part of a floating-point value is 0, the decimal point is not displayed. The showpoint manipulator forces the decimal point to be printed:


cout << 10.0 << endl;        // prints 10
cout << showpoint << 10.0    // prints 10.0000
<< noshowpoint << endl; // revert to default handling of decimal point

The noshowpoint manipulator reinstates the default behavior. The next output expression will have the default behavior, which is to suppress the decimal point if the floating-point value has a 0 fractional part.

Padding the Output

When printing data in columns, we often want fairly fine control over how the data are formatted. The library provides several manipulators to help us accomplish the control we might need:

setw to specify the minimum space for the next numeric or string value.

left to left-justify the output.

right to right-justfiy the output. Output is right-justified by default.

internal controls placement of the sign on negative values. internal left-justifies the sign and right-justifies the value, padding any intervening space with blanks.

setfill lets us specify an alternative character to use when padding the output. By default, the value is a space.

setw, like endl, does not change the internal state of the output stream. It determines the size of only the next output.

The following program illustrates these manipulators


int i = -16;
double d = 3.14159;
// pad first column to use minimum of 12 positions in the output
cout << "i: " << setw(12) << i << "next col" << ''\n''
<< "d: " << setw(12) << d << "next col" << ''\n'';
// pad first column and left-justify all columns
cout << left
<< "i: " << setw(12) << i << "next col" << ''\n''
<< "d: " << setw(12) << d << "next col" << ''\n''
<< right; // restore normal justification
// pad first column and right-justify all columns
cout << right
<< "i: " << setw(12) << i << "next col" << ''\n''
<< "d: " << setw(12) << d << "next col" << ''\n'';
// pad first column but put the padding internal to the field
cout << internal
<< "i: " << setw(12) << i << "next col" << ''\n''
<< "d: " << setw(12) << d << "next col" << ''\n'';
// pad first column, using # as the pad character
cout << setfill(''#'')
<< "i: " << setw(12) << i << "next col" << ''\n''
<< "d: " << setw(12) << d << "next col" << ''\n''
<< setfill('' ''); // restore normal pad character

When executed, this program generates


i:          -16next col
d:      3.14159next col
i: -16         next col
d: 3.14159     next col
i:          -16next col
d:      3.14159next col
i: -         16next col
d:      3.14159next col
i: -#########16next col
d: #####3.14159next col

Table A.3. Manipulators Defined in iomanip

setfill(ch)

Fill whitespace with ch

setprecision(n)

Set floating-point precision to n

setw(w)

Read or write value to w characters

setbase(b)

Output integers in base b

A.3.4. Controlling Input Formatting

By default, the input operators ignore whitespace (blank, tab, newline, formfeed, and carriage return). The following loop


while (cin >> ch)
cout << ch;

given the input sequence


a b   c
d

executes four times to read the characters a through d, skipping the intervening blanks, possible tabs, and newline characters. The output from this program is


abcd

The noskipws manipulator causes the input operator to read, rather than skip, whitespace. To return to the default behavior, we apply skipws manipulator:


cin >> noskipws;      // set cin so that it reads whitespace
while (cin >> ch)
cout << ch;
cin >> skipws; // reset cin to default state so that it discards whitespace

Given the same input as before, this loop makes seven iterations, reading white-space as well as the characters in the input. This loop generates


a b    c
d

A.3.5. Unformatted Input/Output Operations

So far, our programs have used only formatted IO operations. The input and output operators (<< and >>) format the data they read or write according to the data type being handled. The input operators ignore whitespace; the output operators apply padding, precision, and so on.

The library also provides a rich set of low-level operations that support unformatted IO. These operations let us deal with a stream as a sequence of uninterpreted bytes rather than as a sequence of data types, such as char, int, string, and so on.

A.3.6. Single-Byte Operations

Several of the unformatted operations deal with a stream one byte at a time. They read rather than ignore whitespace. For example, we could use the unformatted IO operations get and put to read the characters one at a time:


char ch;
while (cin.get(ch))
cout.put(ch);

This program preserves the whitespace in the input. Its output is identical to the input. Given the same input as read by the previous program that used noskipws, this program generates the same output:


a b    c
d

Table A.4. Single-Byte Low-Level IO Operations

is.get(ch)

Puts next byte from the istream is in character ch. Returns is.

os.put(ch)

Puts character ch onto the ostream os. Returns os.

is.get()

Returns next byte from is as an int.

is.putback(ch)

Puts character ch back on is; returns is.

is.unget()

Moves is back one byte; returns is.

is.peek()

Returns the next byte as an int but doesn''t remove it.

Putting Back onto an Input Stream

Sometimes we need to read a character in order to know that we aren''t ready for it yet. In such cases, we''d like to put the character back onto the stream. The library gives us three ways to do so, each of which has subtle differences from the others:

peek returns a copy of the next character on the input stream but does not change the stream. The value returned by peek stays on the stream and will be the next one retrieved.

unget backs up the input stream so that whatever value was last returned is still on the stream. We can call unget even if we do not know what value was last taken from the stream.

putback is a more specialized version of unget: It returns the last value read from the stream but takes an argument that must be the same as the one that was last read. Few programs use putback because the simpler unget does the same job with fewer constraints.

In general, we are guaranteed to be able to put back at most one value before the next read. That is, we are not guaranteed to be able to call putback or unget successively without an intervening read operation.

int Return Values from Input Operations

The version of get that takes no argument and the peek function return a character from the input stream as an int. This fact can be surprising; it might seem more natural to have these functions return a char.

The reason that these functions return an int is to allow them to return an end-of-file marker. A given character set is allowed to use every value in the char range to represent an actual character. Thus, there is no extra value in that range to use to represent end-of-file.

Instead, these functions convert the character to unsigned char and then promote that value to int. As a result, even if the character set has characters that map to negative values, the int returned from these operations will be a positive value (Section 2.1.1, p. 36). By returning end-of-file as a negative value, the library guarantees that end-of-file will be distinct from any legitimate character value. Rather than requiring us to know the actual value returned, the iostream header defines a const named EOF that we can use to test if the value returned from get is end-of-file. It is essential that we use an int to hold the return from these functions:


int ch;   // NOTE: int, not char!!!!
// loop to read and write all the data in the input
while ((ch = cin.get()) != EOF)
cout.put(ch);

This program operates identically to one on page 834, the only difference being the version of get that is used to read the input.

A.3.7. Multi-Byte Operations

Other unformatted IO operations deal with chunks of data at a time. These operations can be important if speed is an issue, but like other low-level operations they are error-prone. In particular, these operations require us to allocate and manage the character arrays (Section 4.3.1, p. 134) used to store and retrieve data.

The multi-byte operations are listed in Table A.5 (p. 837). It is worth noting that the get member is overloaded; there is a third version that reads a sequence of characters.

Caution: Low-Level Routines Are Error-Prone

[View full width]

char ch; // Using a char here invites disaster!
// return from cin.get is converted from int to char and
then compared to an int
while ((ch = cin.get()) != EOF)
cout.put(ch);

The problem is that when get returns EOF, that value will be converted to an unsigned char value. That converted value is no longer equal to the integral value of EOF, and the loop will continue forever.

At least that error is likely to be caught in testing. On machines for which chars are implemented as signed chars, we can''t say with confidence what the behavior of the loop might be. What happens when an out-of-bounds value is assigned to a signed value is up to the compiler. On many machines, this loop will appear to work, unless a character in the input matches the EOF value. While such characters are unlikely in ordinary data, presumably low-level IO is necessary only when reading binary values that do not map directly to ordinary characters and numeric values. For example, on our machine, if the input contains a character whose value is ''\377'' then the loop terminates prematurely. ''\377'' is the value on our machine to which -1 converts when used as a signed char. If the input has this value, then it will be treated as the (premature) end-of-file indicator.

Such bugs do not happen when reading and writing typed values. If you can use the more type-safe, higher-level operations supported by the library, do so.

The get and getline functions take the same parameters, and their actions are similar but not identical. In each case, sink is a char array into which the data are placed. The functions read until one of the following conditions occurs:

size - 1 characters are read

End-of-file is encountered

The delimiter character is encountered

Following any of these conditions, a null character is put in the next open position in the array. The difference between these functions is the treatment of the delimiter. get leaves the delimiter as the next character of the istream. getline reads and discards the delimiter. In either case, the delimiter is not stored in sink.

It is a common error to intend to remove the delimiter from the stream but to forget to do so.

Table A.5. Multi-Byte Low-Level IO Operations

is.get(sink, size, delim)

Reads up to size bytes from is and stores them in the character array pointed to by sink. Reads until encountering the delim character or until it has read size bytes or encounters end-of-file. If the delim is present, it is left on the input stream and not read into sink.

is.getline(sink, size, delim)

Same behavior as three-argument version of get but reads and discards delim.

is.read(sink, size)

Reads up to size bytes into the character array sink. Returns is.

is.gcount()

Returns number of bytes read from the stream is by last call to an unformatted read operation.

os.write(source, size)

Writes size bytes from the character array source to os. Returns os.

is.ignore(size, delim)

Reads and ignores at most size characters up to but not including delim. By default, size is 1 and delim is end-of-file.

Determining How Many Characters Were Read

Several of the read operations read an unknown number of bytes from the input. We can call gcount to determine how many characters the last unformatted input operation read. It is esssential to call gcount before any intervening unformatted input operation. In particular, the single-character operations that put characters back on the stream are also unformatted input operations. If peek, unget, or putback are called before calling gcount, then the return value will be 0!

A.3.8. Random Access to a Stream

The various stream types generally support random access to the data in their associated stream. We can reposition the stream so that it skips around, reading first the last line, then the first, and so on. The library provides a pair of functions to seek to a given location and to tell the current location in the associated stream.

Random IO is an inherently system-dependent. To understand how to use these features, you must consult your system''s documentation.

Seek and Tell Functions

To support random access, the IO types maintain a marker that determines where the next read or write will happen. They also provide two functions: One repositions the marker by seeking to a given position; the second tells us the current position of the marker. The library actually defines two pairs of seek and tell functions, which are described in Table A.6. One pair is used by input streams, the other by output streams. The input and output versions are distinguished by a suffix that is either a g or a p. The g versions indicate that we are "getting" (reading) data, and the p functions indicate that we are "putting" (writing) data.

Table A.6. Seek and Tell Functions

seekg

Reposition the marker in an input stream

tellg

Return the current position of the marker in an input stream

seekp

Reposition the marker for an output stream

tellp

Return the current position of the marker in an output stream

Logically enough, we can use only the g versions on an istream or its derived types ifstream, or istringstream, and we can use only the p versions on an ostream or its derived types ofstream, and ostringstream. An iostream, fstream, or stringstream can both read and write the associated stream; we can use either the g or p versions on objects of these types.

There Is Only One Marker

The fact that the library distinguishes between the "putting" and "getting" versions of the seek and tell functions can be misleading. Even though the library makes this distinction, it maintains only a single marker in the filethere is not a distinct read marker and write marker.

When we''re dealing with an input-only or output-only stream, the distinction isn''t even apparent. We can use only the g or only the p versions on such streams. If we attempt to call tellp on an ifstream, the compiler will complain. Similarly, it will not let us call seekg on an ostringstream.

When using the fstream and stringstream types that can both read and write, there is a single buffer that holds data to be read and written and a single marker denoting the current position in the buffer. The library maps both the g and p positions to this single marker.

Because there is only a single marker, we must do a seek to reposition the marker whenever we switch between reading and writing.

Plain iostreams Usually Do Not Allow Random Access

The seek and tell functions are defined for all the stream types. Whether they do anything useful depends on the kind of object to which the stream is bound. On most systems, the streams bound to cin, cout, cerr and clog do not support random accessafter all, what would it mean to jump ten places back when writing directly to cout? We can call the seek and tell functions, but these functions will fail at run time, leaving the stream in an invalid state.

Because the istream and ostream types usually do not support random access, the remainder of this section should be considered as applicable to only the fstream and sstream types.

Repositioning the Marker

The seekg and seekp functions are used to change the read and write positions in a file or a string. After a call to seekg, the read position in the stream is changed; a call to seekp sets the position at which the next write will take place.

There are two versions of the seek functions: One moves to an "absolute" address within the file; the other moves to a byte offset from a given position:


// set the indicated marker a fixed position within a file or string
seekg(new_position); // set read marker
seekp(new_position); // set write marker
// offset some distance from the indicated position
seekg(offset, dir); // set read marker
seekp(offset, dir); // set write marker

The first version sets the current position to a given location. The second takes an offset and an indicator of where to offset from. The possible values for the offset are listed in Table A.7.

Table A.7. Offset From Argument to seek

beg

The beginning of the stream

cur

The current position of the stream

end

The end of the stream

The argument and return types for these functions are machine-dependent types defined in both istream or ostream. The types, named pos_type and off_type, represent a file position and an offset from that position, respectively. A value of type off_type can be positive or negative; we can seek forward or backward in the file.

Accessing the Marker

The current position is returned by either tellg or tellp, depending on whether we''re looking for the read or write position. As before, the p indicates putting (writing) and the g indicates getting (reading). The tell functions are usually used to remember a location so that we can subsequently seek back to it:


// remember current write position in mark
ostringstream writeStr; // output stringstream
ostringstream::pos_type mark = writeStr.tellp();
// ...
if (cancelEntry)
// return to marked position
writeStr.seekp(mark);

The tell functions return a value that indicates the position in the associated stream. As with the size_type of a string or vector, we do not know the actual type of the object returned from tellg or tellp. Instead, we use the pos_type member of the appropriate stream class.

A.3.9. Reading and Writing to the Same File

Let''s look at a programming example. Assume we are given a file to read. We are to write a new line at the end of the file that contains the relative position at which each line begins. For example, given the following file,


abcd
efg
hi
j

the program should produce the following modified file:


abcd
efg
hi
j
5 9 12 14

Note that our program need not write the offset for the first lineit always occurs at position 0. It should print the offset that corresponds to the end of the data portion of the file. That is, it should record the position after the end of the input so that we''ll know where the original data ends and where our output begins.

We can write this program by writing a loop that reads a line at a time:


int main()
{
// open for input and output and pre-position file pointers to end of file
fstream inOut("copyOut",
fstream::ate | fstream::in | fstream::out);
if (!inOut) {
cerr << "Unable to open file!" << endl;
return EXIT_FAILURE;
}
// inOut is opened in ate mode, so it starts out positioned at the end,
// which we must remember as it is the original end-of-file position
ifstream::pos_type end_mark = inOut.tellg();
inOut.seekg(0, fstream::beg); // reposition to start of the file
int cnt = 0;                  // accumulator for byte count
string line;                  // hold each line of input
// while we haven''t hit an error and are still reading the original data
// and successfully read another line from the file
while (inOut && inOut.tellg() != end_mark
&& getline(inOut, line))
{
cnt += line.size() + 1; //   add 1 to account for the newline
// remember current read marker
ifstream::pos_type mark = inOut.tellg();
inOut.seekp(0, fstream::end);// set write marker to end
inOut << cnt;        // write the accumulated length
// print separator if this is not the last line
if (mark != end_mark) inOut << " ";
inOut.seekg(mark);         // restore read position
}
inOut.clear();                // clear flags in case we hit an error
inOut.seekp(0, fstream::end); // seek to end
inOut << "\n";                // write a newline at end of file
return 0;
}

This program opens the fstream using the in, out, and ate modes. The first two modes indicate that we intend to both read and write to the same file. By also opening it in ate mode, the file starts out positioned at the end. As usual, we check that the open succeeded, and exit if it did not.

Initial Setup

The core of our program will loop through the file a line at a time, recording the relative position of each line as it does so. Our loop should read the contents of the file up to but not including the line that we are adding to hold the line offsets. Because we will be writing to the file, we can''t just stop the loop when it encounters end-of-file. Instead, the loop should end when it reaches the point at which the original input ended. To do so, we must first remember the original end-of-file position.

We opened the file in ate mode, so it is already positioned at the end. We store the initial end position in end_mark. Of course, having remembered the end position, we must reposition the read marker at the beginning of the file before we attempt to read any data.

Main Processing Loop

Our while loop has a three-part condition.

We first check that the stream is valid. Assuming the first test on inOut succeeds, we then check whether we''ve exhausted our original input. We do this check by comparing the current read position returned from tellg with the position we remembered in end_mark. Finally, assuming that both tests succeeded, we call getline to read the next line of input. If getline succeeds, we perform the body of the loop.

The job that the while does is to increment the counter to determine the offset at which the next line starts and write that marker at the end of the file. Notice that the end of the file advances on each trip through the loop.

We start by remembering the current position in mark. We need to keep that value because we have to reposition the file in order to write the next relative offset. The seekp call does this repositioning, resetting the file pointer to the end of the file. We write the counter value and then restore the file position to the value we remembered in mark. The effect is that we return the marker to the same place it was after the last read. Having restored the marker, we''re ready to repeat the condition in the while.

Completing the File

Once we exit the loop, we have read each line and calculated all the starting offsets. All that remains is to print the offset of the last line. As with the other writes, we call seekp to position the file at the end and write the value of cnt. The only tricky part is remembering to clear the stream. We might exit the loop due to an end-of-file or other input error. If so, inOut would be in an error state, and both the seekp and the output expression would fail.