6.1 Finalizing the Image ComponentsOverall, the time we spent prototyping was very productive. This section describes the final design for the image components within the framework, including:Image coordinatesImage storagePixel types 6.1.1 Image CoordinatesIn Section 1.1 on page 4 we discussed image coordinates by saying that the image origin (0,0) is located at the top left corner of the image. Doing so makes the mapping easier from a pixel coordinate to a physical memory location. This convention is widely used in image processing. We will take this definition one step further and relax the restriction that (0,0) is the top left corner. This allows us to deal with image windows, where a window is a portion of another image.User-defined image coordinates are not discussed in this book. With user-defined coordinates, the user can access images using coordinates that make sense for a particular application. This can be something as simple as redefining the coordinate system, such that (0,0) is located at the lower left corner of the image. Or, it can be as complicated as specifying coordinates in terms of real units, like millimeters. We don't deny that this is a useful feature, but it is also very application-specific. If you need to define your own coordinate system, you can encapsulate this in your own code and transform it to our native coordinates.In the final version of our framework, an image has three properties:origin the x and y coordinates of the upper left cornerwidth the width (x-axis) of the image in pixelsheight the height (y-axis) of the image in pixelsWhen an image is created, all three values are specified, with the origin typically being (0,0).To make working with coordinates easier, we create two generic objects to handle points and rectangles.POINTA point is an (x,y) pair that specifies the integer coordinates of a pixel. Our apPoint object is shown here.
We considered using the standard library object, std::pair<>, to represent a pair of coordinates, but we implemented our own point class instead so that we can add exactly what functionality we need. Coordinate information is stored as an int, permitting signed coordinate values. Our application uses apPoint frequently, so we define stream operators for our apBString class, as shown.
For information on apBString, refer to our earlier discussion on page 88. By defining insertion and extraction operators for our common objects, we can later use them as a persistence mechanism. These functions are very simple, as we can see by looking at the insertion operator:
This function writes a single value, another apBString, instead of writing two separate values to stream. By encapsulating the x and y coordinates inside apBString, we adhere to our standard of writing a single element for each object or datatype.RECTANGLEA rectangle describes a point with a width and height. We use this to define the boundary of an image. The width_ and height_ parameters are unsigned int, since they can only take on positive values. A rectangle with an origin at (0,0), and width_ of 10, and height_ of 10, describes a region with corners (0,0) and (9,9) because the coordinates are zero-based. A null rectangle, a degenerate case where the rectangle is nothing more than a point, occurs when width_ or height_ is zero.Our rectangle object, including inline functions, is shown here.
Let's discuss a few of the methods in our rectangle object.INTERSECT()The intersect() method computes the intersection of two rectangles, producing an output rectangle, or a null rectangle if there is no intersection. This method handles a number of conditions, including partial and complete overlap, as illustrated in Figure 6.2. Figure 6.2. Intersection ConditionsThe implementation of the intersect() method is shown here.
WITHIN()The within() method tests whether or not a point is inside the rectangle. It returns true if the point is inside or on the border. The implementation of within() is shown here.
EXPAND()The expand() method increases the size of the rectangle by adding a specific quantity to its dimensions. This method is very useful when performing image processing operations that create output images larger than the original image. Note that you can also pass in negative values to shrink the rectangle. The implementation of expand() is shown here.
6.1.2 Image StorageOur third prototype (Section 3.2.6 on page 60) contributed the concept of separating image storage from the image class. After working with many examples, we realized that this prototype was still lacking the ability to handle certain details. The details that we now address in the final design include:Handles and rep objects. The bottom line is that they do not fit in the design. We still use reference counting by means of apAlloc<>, but having another layer of abstraction is not necessary. Our final image storage object encapsulates an apAlloc<> object along with other storage parameters. Because we aren't using handles, these storage objects get copied as they are passed. Fortunately, the copy constructor and assignment operators are very fast, so performance is not an issue because the pixel data itself is reference counted. The complexity of the additional layer of abstraction didn't provide enough of a benefit to make it into the final design.Memory alignment. The alignment issues we introduced during the discussion of apAlloc<> (Section 3.1.5 on page 31) require more refinement. apAlloc<> supports the alignment of memory on a user-specified pixel boundary. Proper alignment can be critical for efficient performance of many image processing functions. As it turns out, it is not sufficient to align the first pixel in the image as apAlloc<> does. Most image processing routines process one line at a time. By forcing the first pixel in each line to have a certain alignment, many operations become more efficient. For generic algorithms, this savings can be modest or small because the compiler may not be able to take advantage of the alignment. However, specially tuned functions can be written to take advantage of particular memory alignments. Many third-party libraries contain carefully written assembly language routines that can yield impressive savings on aligned data. Our final design has been extended to better address memory alignment.Image shape. We refer to the graphical properties of the storage as image shape. For example, almost all images used by image processing packages are rectangular; that is, they describe pixels that are stored in a series of rows. Our prototypes and test application described rectangular images. In our final design, we explicitly support rectangular images so that we can optimize the storage of such images, but we also allow the future implementation of non-rectangular images. For example, you might have valid image information for a large, circular region. If we store this information as a rectangle, many bytes are wasted because we have to allocate space for pixels that do not contain any useful information. A more memory-efficient method for storing non-rectangular pixel data is to use run-length encoding. With run-length encoding, you store the pixel data along with the (x,y) coordinates and length of the row. This allows you to store only those pixels that contain valid information. The disadvantage of run-length encoding is the difficulty of writing image processing routines that operate on one or more run-length encoded images. Final DesignThe final design partitions image storage into three pieces, as illustrated in Figure 6.3. Figure 6.3. Image Storage Final DesignapImageStorageBase is the base class that describes the rectangular boundary of any storage object. For rectangular images, it describes the valid coordinates for pixels in the image. If you extend the framework to implement non-rectangular images, you would describe the minimum enclosing rectangle surrounding the region. apRectImageStorage extends apImageStorageBase to manage the storage for rectangular images. apRectImageStorage is not a template class; instead, it allocates storage based on the number of bytes of storage per pixel and a desired memory alignment for each row in the image. By making this a generic definition, apRectImageStorage can handle all aspects of image storage. apImageStorage<T>, however, is a template class that defines image storage for a particular data type. Most apImageStorage<> methods act as wrapper functions by calling methods inside apRectImageStorage and applying a cast. Let's look at these components in more detail in the following sections.APIMAGESTORAGEBASEWe start by looking at apImageStorageBase. This base class only has an understanding of the boundary surrounding the image storage.
Once the rectangular boundary is specified in the constructor, the object is immutable and cannot be changed. It is designed this way because changing the boundary coordinate information would affect how this object interacts with other images that are already defined.APRECTIMAGESTORAGEapRectImageStorage is the most complicated object in our hierarchy. It handles all aspects of memory management, including allocation, locking, and windowing. In this section, we describe in detail how this object works. (The full source code is found on the CD-ROM.) Reviewing the protected member data of apRectImageStorage shows us the details of the implementation:
storage_ contains the actual pixel storage as an array of bytes. apAlloc<> allows a number of objects to share the same storage, but the storage itself is fixed in memory. This allows us to create image windows. An image window is an image that reuses the storage of another image. In other words, we can have multiple apRectImageStorage objects that use identical storage, but possibly only a portion of it. To improve the efficiency of accessing pixels in the image, the object maintains begin_ and end_ to point to the first pixel used by the object and just past the end, respectively. Derived objects use these pointers to construct iterator objects, similar to how the standard C++ library uses them. bytesPerPixel_ and align_ store the pixel size and alignment information passed during object construction. Instead of directly specifying the numeric alignment value, eAlignment provides a clean way to specify alignment, as shown.
eAlignment has entries using two different naming conventions, giving the user the flexibility of choosing from two popular ones. rowSpacing_ contains the number of bytes from one row to the next. This is often different than the width of the image because of alignment issues. By adding rowSpacing_ to any pixel pointer, you can quickly advance to the same pixel in the next row of the image.xoffset_ and yoffset_ are necessary for image windows. Just because two images share the same storage_ does not mean they access the same pixels. Image windowing lets an image contain a rectangular portion of another image. xoffset_ and yoffset_ are the pixel offsets from the first pixel in storage_ to the first pixel in the image. If there is no image window, both of these offsets are zero.The only remaining protected data member that we haven't described is lock_. lock_ handles synchronization to the rest of the image storage variables, with the exception of storage_ (because it uses apAlloc<>, which has its own independent locking mechanism).The constructor of apRectImageStorage is as follows:
Constructing the storage for an image requires the size and location of the image, the pixel depth (i.e., the number of bytes per pixel), and alignment requirements. For example:
creates a 2x3 image, with an origin at (0,0). Each pixel requires 3 bytes, and the start of each line will have double-word (i.e., 4-byte) alignment, as shown in Figure 6.4. Figure 6.4. Image Storage AlignmentEach line in the image requires 8 bytes of storage, although only 6 bytes contain pixel data. The first three bytes hold the storage for the first pixel in the line, and are followed by three more bytes to hold the next pixel. In order to begin the next row with double-word alignment, we must skip 2 bytes before storing the pixels for the next line. We dealt with memory alignment when we introduced apAlloc<>. The arithmetic is the same, except we must apply it to each line, as shown in the implementation of apRectImageStorage():
We also use a number of locking functions to synchronize access to both the image storage parameters and the image storage itself, as shown.
Locking is not a difficult feature to add to an object, but it is important to consider where to use it effectively. In our design, for example, several instances of apRectImageStorage can use the same underlying pixel storage. There is no need to lock access to this storage if we are only manipulating other member variables of apRectImageStorage. lockState() is best used when the state of apRectImageStorage changes. lockStorage() is used when the actual pixel data is accessed. lock() is a combination of the two, and is useful when all aspects of the image storage are affected. These functions are used by derived objects and non-member functions, since locking is a highly application-specific issue.
window() modifies an instance of apRectImageStorage by computing the intersection of the specified rectangle with the rectangle that defines which pixels are managed. This function is not as complicated as it sounds, because the intersect() method of apRect computes the overlap between the window and the original image. Once this is computed, the other variables can be updated, as shown.
If the intersection is null, that is, there is no overlap between the rectangles, init() resets the object to a null state. The remainder of the member variables are then updated to reflect the intersection. The window() function only affects local variables, so we call lockState() to lock access to member variables, because we do not also have to lock the underlying image storage.Basic access to pixel data is provided by functions we have seen before:
rowAddress_() is used by derived classes to return the address of the first pixel in a specific row. Derived objects will cast these pointers to their proper type. You use getPixel() and setPixel() in a similar manner. We use the underscore, _, as a suffix in rowAddress_() to indicate that it is primarily an internal function.APROWITERATOR<>Before we introduce the actual storage objects, we need to introduce an iterator that can be used to simplify image processing functions. Like iterators defined by the standard C++ library, our apRowIterator<> object allows each row in the image to be accessed, as shown.
Once you obtain an apRowIterator object from an apImageStorage<> object (presented in the next section), you can use it to access each row in the image, as follows:
Iterators don't really save us much typing, but they do hide the operation of fetching the address of each line. If we did not have an iterator, we would write something like the following, where T represents the pixel type:
APPIXELITERATORWe also create an iterator suitable for accessing every pixel in an image. apPixelIterator is similar in design to apRowIterator, but it is implemented using the standard STL iterator traits. See [Stroustrup00]. This makes the iterator usable by the generic STL algorithms, as shown.
APIMAGESTORAGE<>apImageStorage<> is a template object, derived from apRectImageStorage, that defines image storage for arbitrary datatypes. Its definition is shown here.
This object builds upon its base class. You can continue to access pixel data using getPixel() and setPixel(), but you can also access a row of data by using rowAddress(), or a row or pixel iterator. Our row_begin() and row_end() iterators use a typedef called row_iterator to hide direct references to apRowIterator<>. Likewise, our begin() and end() iterators use a typedef called iterator to hide direct references to apPixelIterator<>.
On page 188, we will see how to dramatically simplify getPixel() by using an exception-safe locking object.
Our apImageStorageLocker<> implementation locks only apImageStorage<> objects, although it wouldn't be hard to create a generic version. Here is how it works. When an apImageStorageLocker<> object is created, a reference to an apImageStorage<> object is stored and the object is locked. When the apImageStorageLocker<> object is destroyed, the lock on apImageStorage<> is released. You can see how powerful this simple technique is when it is used within another function.For example, getPixel() explicitly handles the locking and unlocking in its implementation. This function can be greatly simplified with the use of apImageStorageLocker<>, as shown.
As you can see, we create a temporary instance of apImageStorageLocker<> on the stack. When getPixel() goes out of scope, either because of normal completion or during stack unwinding of an exception, the lock is guaranteed to be released.COPYING IMAGE STORAGEIn the source code, we provide two generic functions: copy() and duplicate(). copy() moves pixels between two images, while duplicate() generates an identical copy of an apImageStorage<> object. Because we are dealing with template objects, our copy() function copies image pixels from one data type to another, as shown.
Our design of copy() has the following interesting features:The output storage must have the same dimensions as the input image. If not, a new apImageStorage<T2> object is returned. This is a low-level copy function and we do not want to worry about image boundaries that do not match. It would be better to handle this at a higher level in the code.If T1 and T2 are identical, memcpy() is used to duplicate pixels. This technique doesn't work for complex data types, so an optional argument, fastCopy, has been added.If T1 and T2 are not identical, or if fastCopy is false, a pixel-by-pixel copy occurs.The copy() implementation is shown here.
6.1.3 Pixel TypesIn addition to the standard C data types that are used in image processing applications, a robust image processing framework must also handle the following complexities:Support for basic data types such that they can be manipulated (i.e., added, subtracted, and so on) in the standard waysAn RGB data type that allows a generic image processing routine to handle color pixelsA clamping (i.e., saturation) object that is used like other data types and eliminates the undesirable pixel-wrapping behavior arising from overflow issues Basic Data TypesIn our image framework, the pixel type is specified as a template parameter. In reality, there are only a few common data types that most image processing applications need. Here are the basic types used in image processing:
These names are very descriptive since they refer to pels (picture elements, or pixels) and there is no confusion when used in image processing applications. Most images captured from monochrome sensors are represented using the Pel8 data type. Some sensors have more sensitivity and output 10 or 12 bits of information. In that case, we would use a Pel16 to store the image. RGB Data TypePixels in color images are usually represented by RGB triplets. We showed a simple implementation of an RGB triplet during the prototyping stage.The following simple structure is not sufficient for our final design:
We need to have the ability to write statements like the following:
Instead of defining a separate structure for each type of RGB image, we define apRGBTmpl<>, where the template parameter is the data size of the red, green, and blue component. In apRGBTmpl<> we add basic operators, as well as conversion functions between a color and monochrome pixel, as shown.
The complete implementation can be found in imageTypes.h on the CD-ROM. You will notice that we added functions, such as operator apRGBTmpl<T2>, to make it easy to convert between different RGB types. Clamping Object for OverflowOverflow is usually not an issue when an application uses mostly int and double data types. However, when you use the smaller data types, like unsigned char, you have to be very aware of overflows. What happens to the pixels that overflow the storage? What usually happens is that the output will wrap, just like any mathematical operation on the computer.This behavior has never seemed correct when dealing with image processing functions. After all, if a value of 0 is black and 255 is white, 255+1 should be stored as 255 and not 0. This clamping behavior is also called saturation.
In order for our design to handle the overflow issue, our code must partition this functionality so that it can be integrated into image processing functions. If you are not careful, this can get very difficult, because the overflow check must be made prior to the conversion to the final pixel type. If you allow the compiler to make an implicit conversion before any overflow checks are made, you will not be able to clamp the output values at their limits, and your checks will be wasted.
Clamping FunctionsTo clamp a value at the limits of a data type, we must know the limits. With C, we used #include <limits.h> to get this functionality. With C++, we can use #include <limit>. The std::numeric_limits class gives us everything we need. We can easily determine the minimum and maximum values for a data type by querying the static functions of this object, as shown:
We do not use these limits directly; rather, we encapsulate them in one of our own objects, apLimitInfo<>. We are doing this because some of our data types, such as apRGB, have no std::numeric_limits defined. Instead of defining 30 or so constants and static methods, we decided it was easier to define only what we need. If you insist on using std::numeric_limits, you can define your own implementation for any new types, and then replace our references to apLimitInfo<>. The apLimitInfo<> definition is as shown.
apLimitInfo<> gives us a common place to define limit information for any data type. The definitions for a few of the data types are shown here.
These may look long, but they are just a machine-independent way of saying:
CLAMPING FUNCTIONWe can now construct a simple clamping function to test and clamp the output to its minimum and maximum value.
You can use these functions as follows:
With this syntax, you explicitly define the type of clamping you desire. In this particular example, the benefits are well worth the added bit of typing. The compiler will generate an error if you neglect to specify the clamping data type. Clamping ObjectWe define an apClampedTmpl<> object to add our clamping behavior. apClampedTmpl<> can be used in place of the Pel8, Pel16, Pel32, and other data types to add our clamping behavior whenever a numeric quantity is used. The apClampedTmpl<> definition is shown here.
We have included conversions between apClampedTmpl<> and our template parameter, T, to make apClampedTmpl<> easier to work with. By providing these operators, the compiler can make whatever implicit conversions are necessary.One last step is to make apClampedTmpl<> look more like a data type by using typedefs, as shown.
Operators and Mathematical FunctionsWe also need to define a number of other global operators and functions that image processing routines will need.OPERATOR-An example of the subtraction operator, which subtracts a constant value from apClampTmpl<>, is as follows:
Notice that apLimit is used to apply clamping before the result is assigned to the destination pixel type. This gives us the ability to write such code as this:
We now add a few more arithmetic operations that our image processing functions require.ADD2()We implement two versions of image addition: one version that operates on a generic type, and one version that employs clamping. These are as shown here.
We do the same thing for sub2, mul2, and div2. The C++ standard is very specific regarding which template to instantiate. The generic implementation of add2<> shown here is too similar to the versions that use apClampedTmpl<> objects, causing errors to be generated by the compiler. Instead, we define explicit versions to handle the cases we need and turn the generic version of add2<> into a comment. For example, a version of add2<> that works with 32-bit pixel types is shown here.
SCALE()scale() does a simple scaling of the source argument by a floating parameter, as shown here.
We also apply these operations to our RGB data type. Since RGB images require more processing than monochrome images, we built clamping into the RGB image, instead of defining another type. We can see that by looking at the definition shown here.
During the development of generic image processing routines, you should use these functions at every opportunity. What's So Important About Clamping?It may seem like we have gone through great lengths for little benefit. After all, we have mandated that image processing functions do this:
instead of this:
The latter is more intuitive, but it is also more prone to error. If we are careful, we can construct operator+ and operator for the various data types to give the desired behavior. The problem is that the compiler can do almost too good a job at finding a way to compile this line by applying suitable conversions. Because we are dealing with basic data types, like unsigned char, the compiler often has many ways to convert from one data type to another. This makes it easy to write code that does not perform as you expect. If this happens, there is a possibility that the error may not be caught until your testing phase. |