10.3 Algorithms
The so-called algorithms
in the standard library distinguish C++ from other programming
languages. Every major programming language has a set of containers,
but in the traditional object-oriented approach, each container
defines the operations that it permits, e.g., sorting, searching, and
modifying. C++ turns object-oriented programming on its head and
provides a set of function templates, called
algorithms, that work with iterators, and
therefore with almost any container.The advantage of the C++ approach is that the library can contain a
rich set of algorithms, and each algorithm can be written once and
work with (almost) any kind of container. And when you define a
custom container, it automatically works with the standard algorithms
(assuming you implemented the container's iterators
correctly). The set of algorithms is easily extensible without
touching the container classes. Another benefit is that the
algorithms work with iterators, not containers, so even non-container
iterators (such as the stream iterators) can participate.C++ algorithms have one disadvantage, however.
Remember that iterators, like pointers, can be unsafe. Algorithms use
iterators, and therefore are equally unsafe. Pass the wrong iterator
to an algorithm, and the algorithm cannot detect the error and
produces undefined behavior. Fortunately, most uses of algorithms
make it easy to avoid programming errors.Most of the standard algorithms are declared in the
<algorithm> header, with some numerical
algorithms in <numeric>. Refer to the
respective sections of Chapter 13 for details.
10.3.1 How Algorithms Work
The generic algorithms all work in a similar
fashion. They are all function templates, and most have one or more
iterators as template parameters. Because the algorithms are
templates, you can instantiate the function with any template
arguments that meet the basic requirements. For example,
for_each is declared as follows:
template<typename InIter, typename Function>
Function for_each(InIter first, InIter last, Function func);
The names of the template parameters tell you what is expected as
template arguments: InIter must be an input
iterator, and Function must be a function pointer
or functor. The documentation for for_each further
tells you that Function must take one argument
whose type is the value_type of
InIter. That's all. The
InIter argument can be anything that meets the
requirements of an input iterator. Notice that no container is
mentioned in the declaration or documentation of
for_each. For example, you can use an
istream_iterator.For a programmer trained in traditional object-oriented programming,
the flexibility of the standard algorithms might seem strange or
backwards. Thinking in terms of algorithms takes some adjustment.For example, some object-oriented container classes define
sort as a member function or as a function that
applies only to certain kinds of objects. (For example, Java defines
sort only on arrays and List
objects). If you have a new kind of container, you must duplicate the
implementation of sort or make sure the
implementation of your container maps to one of the standard
implementations of sort. In C++, you can invent
any kind of crazy container, and as long as it supports a random
access iterator, you can use the standard sort
function.Whenever you need to process the contents of a container, you should
think about how the standard algorithms can help you. For example,
suppose you need to read a stream of
numbers into a data array. Typically, you would set up a
while loop to read the input stream and, for each
number read, append the number to the array. Now rethink the problem
in terms of an algorithmic solution. What you are actually doing is
copying data from an input stream to an array, so you could use the
copy algorithm:
std::copy(std::istream_iterator<double>(stream),
std::istream_iterator<double>( ),
std::back_inserter(data));
The copy algorithm copies all the items from one
range to another. The input comes from an
istream_iterator, which is an iterator interface
for reading from an istream. The output range is a
back_insert_iterator (created by the
back_inserter function), which is an output
iterator that pushes items onto a container.At first glance, the algorithmic solution doesn't
seem any simpler or clearer than a straightforward loop:
double x;
while (stream >> x)
data.push_back(x);
More complex examples demonstrate the value of the C++ algorithms.
For example, all major programming languages have a type for
character strings. They typically also have a function for finding
substrings. What about the more general problem of finding a subrange
in any larger range? Suppose a researcher is looking for patterns in
a data set and wants to see if a small data pattern occurs in a
larger data set. In C++, you can use the search
algorithm:
std::vector<double> data;
...
if (std::search(data.begin( ), data.end( ), pattern.begin(),
pattern.end( )) != data.end( ))
{
// found the pattern...
}
A number of algorithms take a function pointer or functor (that is,
an object that overloads operator( )) as one of
the arguments. The algorithms call the function and possibly use the
return value. For example, count_if counts the
number of times the function returns a true (nonzero) result when
applied to each element in a range:
bool negative(double x)
{
return x < 0;
}
std::vector<double>::iterator::difference_type neg_cnt;
std::vector<double> data;
...
neg_cnt = std::count_if(data.begin( ), data.end( ), negative);
In spite of the unwieldy declaration for neg_cnt,
the application of count_if to count the number of
negative items in the data vector is easy to write
and read.If you don't want to write a function to be used
only with an algorithm, you might be able to use the standard
functors or function objects (which are declared in the
<functional> header). For example, the same
count of negative values can be obtained with the following:
std::vector<double>::iterator::difference_type neg_cnt;
std::vector<double> data;
...
neg_cnt = std::count_if(data.begin( ), data.end( ),
std::bind2nd(std::less<double>, 0.0));
The std::less class template defines
operator( ), so it takes two arguments and applies
operator< to those arguments. The
bind2nd function template takes a two-argument
functor and binds a constant value (in this case
0.0) as the second argument, returning a
one-argument function (which is what count_if
requires). The use of standard function objects can make the code
harder to read, but also helps avoid writing one-off custom
functions. (The Boost project expands and enhances the
standard library's binders. See Appendix B for information about Boost.)When using function objects, be very careful if those objects
maintain state or have global side effects. Some algorithms copy the
function objects, and you must be sure that the state is also
properly copied. The numerical algorithms do not permit function
objects that have side effects.Example 10-8 shows one use of a function object. It
accumulates statistical data for computing the mean and variance of a
data set. Pass an instance of Statistics to the
for_each algorithm to accumulate the statistics.
The copy that is returned from for_each contains
the desired results.
Example 10-8. Computing statistics with a functor
#include <algorithm>
#include <cstddef>
#include <cmath>
#include <iostream>
#include <ostream>
template<typename T>
class Statistics {
public:
typedef T value_type;
Statistics( ) : n_(0), sum_(0), sumsq_(0) {}
void operator( )(double x) {
++n_;
sum_ += x;
sumsq_ += x * x;
}
std::size_t count( ) const { return n_; }
T sum( ) const { return sum_; }
T sumsq( ) const { return sumsq_; }
T mean( ) const { return sum_ / n_; }
T variance( ) const
{ return (sumsq_ - sum_*sum_ / n_) / (n_ - 1); }
private:
std::size_t n_;
T sum_;
T sumsq_; // Sum of squares
};
int main( )
{
using namespace std;
Statistics<double> stat = for_each(
istream_iterator<double>(cin),
istream_iterator<double>( ),
Statistics<double>( ));
cout << "count=" << stat.count( ) << '\n';
cout << "mean =" << stat.mean( ) << '\n';
cout << "var =" << stat.variance( ) << '\n';
cout << "stdev=" << sqrt(stat.variance( )) << '\n';
cout << "sum =" << stat.sum( ) << '\n';
cout << "sumsq=" << stat.sumsq( ) << '\n';
}
10.3.2 Standard Algorithms
Chapter 13 describes all the algorithms
in detail. This section presents a categorized summary of the
algorithms.It is always your responsibility to ensure that the output range is
large enough to accommodate the input.If the algorithm name ends with
_if, the final argument must be a
predicate, that is, a function pointer or
function object that returns a Boolean result (a result that is
convertible to type bool).
10.3.2.1 Nonmodifying operations
The following
algorithms
examine every element of a sequence without modifying the order:
- count
Returns the number of items that match a given value- count_if
Returns the number of items for which a predicate returns true- for_each
Applies a function or functor to each item
10.3.2.2 Comparison
The following
algorithms
compare
objects or sequences (without modifying the elements):
- equal
Determines whether two ranges have equivalent contents- lexicographical_compare
Determines whether one range is considered less than another range- max
Returns the maximum of two values- max_element
Finds the maximum value in a range- min
Returns the minimum of two values- min_element
Finds the minimum value in a range- mismatch
Finds the first position where two ranges differ
10.3.2.3 Searching
The following
algorithms
search for a value or a subsequence in a
sequence (without modifying the
elements):
- adjacent_find
Finds the first position where an item is equal to its neighbor- find
Finds the first occurrence of a value in a range- find_end
Finds the last occurrence of a subsequence in a range- find_first_of
Finds the first position where a value matches any one item from a
range of values- find_if
Finds the first position where a predicate returns true- search
- search_n
Finds a subsequence in a range
10.3.2.4 Binary search
The following
algorithms apply a binary search to a
sorted sequence. The sequence typically comes
from a sequence container in which you have already sorted the
elements. You can use an associative containers, but they provide the
last three functions as member functions, which might result in
better performance.
- binary_search
Finds an item in a sorted range using a binary search- equal_range
Finds the upper and lower bounds- lower_bound
Finds the lower bound of where an item belongs in a sorted range- upper_bound
Finds the upper bound of where an item belongs in a sorted range
10.3.2.5 Modifying sequence operations
The following algorithms modify a sequence:
- copy
Copies an input range to an output range- copy_backward
Copies an input range to an output range, starting at the end of the
output range- fill
- fill_n
Fills a range with a value- generate
- generate_n
Fills a range with values returned from a function- iter_swap
Swaps the values that two iterators point to- random_shuffle
Shuffles a range into random order- remove
Reorders a range to prepare to erase all elements equal to a given
value- remove_copy
Copies a range, removing all items equal to a given value- remove_copy_if
Copies a range, removing all items for which a predicate returns true- remove_if
Reorders a range to prepare to erase all items for which a predicate
returns true- replace
Replaces items of a given value with a new value- replace_copy
Copies a range, replacing items of a given value with a new value- replace_copy_if
Copies a range, replacing items for which a predicate returns true
with a new value- replace_if
Replaces items for which a predicate returns true with a new value- reverse
Reverses a range in place- reverse_copy
Copies a range in reverse order- rotate
Rotates items from one end of a range to the other end- rotate_copy
Copies a range, rotating items from one end to the other- swap_ranges
Swaps values in two ranges- transform
Modifies every value in a range by applying a transformation function- unique
Reorders a range to prepare to erase all adjacent, duplicate items- unique_copy
Copies a range, removing adjacent, duplicate items
10.3.2.6 Sorting
The following
algorithms
are related to
sorting and
partitioning.
You can supply a comparison function or functor or rely on the
default, which uses the < operator.
- nth_element
Finds the item that belongs at the nth position
(if the range were sorted) and reorders the range to partition it
into items less than the nth item and items
greater than or equal to the nth item.- partial_sort
Reorders a range so the first part is sorted.- partial_sort_copy
Copies a range so the first part is sorted.- partition
Reorders a range so that all items for which a predicate is true come
before all items for which the predicate is false.- sort
Sorts items in ascending order.- stable_partition
Reorders a range so that all items for which a predicate is true come
before all items for which the predicate is false. The relative order
of items within a partition is maintained.- stable_sort
Sorts items in ascending order. The relative order of equal items is
maintained.
10.3.2.7 Merging
The following
algorithms
merge two sorted
sequences:
- inplace_merge
Merges two sorted, consecutive subranges in place, so the results
replace the original ranges- merge
Merges two sorted ranges, copying the results to a separate range
10.3.2.8 Set operations
The following algorithms apply standard set
operations to sorted
sequences:
- includes
Determines whether one sorted range is a subset of another- set_difference
Copies the set difference of two sorted ranges to an output range- set_intersection
Copies the intersection of two sorted ranges to an output range- set_symmetric_difference
Copies the symmetric difference of two sorted ranges to an output
range- set_union
Copies the union of two sorted ranges to an output range
10.3.2.9 Heap operations
The following algorithms treat a sequence as a
heap
data structure:
- make_heap
Reorders a range into heap order- pop_heap
Reorders a range to remove the first item in the heap- push_heap
Reorders a range to add the last item to the heap- sort_heap
Reorders a range that starts in heap order into fully sorted order
10.3.2.10 Permutations
The following reorder the elements of a sequence to generate
permutations:
- next_permutation
Reorders a range to form the next permutation- prev_permutation
Reorders a range to form the previous permutation
10.3.3 Custom Algorithms
Writing your own
algorithm
is easy. Some care is always needed when writing function templates
(as discussed in Chapter 7), but generic
algorithms do not present any special or unusual challenges. Be sure
you understand the requirements of the different categories of
iterators and write your algorithm to use the most general category
possible. You might even want to specialize your algorithm to improve
its performance with some categories.The first generic algorithm that most programmers will probably write
is copy_if, which was inexplicably omitted from
the standard. The
copy_if function copies an input range to an
output range, copying only the values for which a predicate returns
true (nonzero). Example 10-9 shows a simple
implementation of copy_if.
Example 10-9. One way to implement the copy_if function
template<typename InIter, typename OutIter, typename Pred>
OutIter copy_if(InIter first, InIter last, OutIter result, Pred pred)
{
for (; first != last; ++first)
if (pred(*first)) {
*result = *first;
++result;
}
return result;
}
You can also specialize an algorithm. For example, you might be able
to implement the algorithm more efficiently for a random access
iterator. In this case, you can write helper functions and use the
iterator_category trait to choose a specialized
implementation. (Chapter 8 has more information
about traits, including an example of how to use iterator traits to
optimize a function template.)The real trick in designing and writing algorithms is being able to
generalize the problem and then find an efficient solution. Before
running off to write your own solution, check the standard library.
Your problem might already have a solution.For example, I recently wanted to write an algorithm to find the
median value in a range. There is no median
algorithm, but there is nth_element, which solves
the more general problem of finding the element at any sorted index.
Writing median became a trivial matter of making a
temporary copy of the data, calling nth_element,
and then returning an iterator that points to the median value in the
original range. Because median makes two passes
over the input range, a forward iterator is required, as shown in
Example 10-10.
Example 10-10. Finding the median of a range
template<typename FwdIter, typename Compare>
FwdIter median(FwdIter first, FwdIter last, Compare compare)
{
typedef typename std::iterator_traits<FwdIter>::value_type value_type;
std::vector<value_type> tmp(first, last);
typename std::vector<value_type>::size_type median_pos = tmp.size( ) / 2;
std::nth_element(tmp.begin( ), tmp.begin( ) + median_pos,
tmp.end( ), compare);
return std::find(first, last, tmp[median_pos]);
}
