Pay Special Attention to String Usage in Your AlgorithmsStrings are special and unique data types in their ubiquity and utility in building software. Strings often represent human-readable text and just as often convey machine data such as database query strings. In short, strings are everywhere and byte for byte may be the most commonly used and abused data type in many applications. Modern programming languages make it very easy to work with strings, to create them, to split them, to copy them, and to append them. Consider the following simple statements: Each statement is allocating data and creating types in memory, and we haven't even called any functions yet! Strings are used so casually by developers that they are very easy to abuse. Unoptimized string handling is one of the largest causes of poor performance and also one of the easiest to prevent. A few facts and rules of thumb follow. (These facts are based on the .NET Framework/.NET Compact Framework. Other runtimes tend to follow similar rules. Check your specific language/runtime reference guide for specific details.)Strings are immutable. This fancy word immutable simply means that the text data for a string cannot be changed in memory. Any code operation that you may think is changing the data in the string is actually creating a new string. Immutability has some very nice characteristics. For example, because the string data itself is static, multiple variables can point to the same data; this makes assigning one string variable to another a shallow "pointer" copy and not a deep copy of the underlying data. The downside of immutability is the inability to make any changes to the data. If you want to change the data, add to the data, or truncate the data, the change will be reflected in a new copy of the string.When string data is no longer referenced by any"live"variables, it is garbage . Example: When you want to reference some subset of a string, it is often far more efficient to do so using integer indexes into the string. Because strings are data in an array of characters, it is easy to get to the data using integer indexes. There exist plenty of functions to enable you to seek out and look at data inside strings with indexes (just not change that data!).When you are building a new string in a loop, strongly consider using a StringBuilder object. All kinds of strings are built out of other variables processed in loops. A typical example of dynamic string creation would be a loop that generated a text report with each line containing the following data: Instead of concatenating the strings together and creating a new string in each iteration of the loop, a StringBuilder class could be used to build the report. The StringBuilder is a class that is very useful for working with a dynamically sized array of characters to build strings. It enables you to efficiently add onto this array, to change the contents or length of the array, and, importantly, to create a new string based on the char array. Learn to use the StringBuilder class because it is the key to writing efficient algorithms that generate string data.Measure the performance of your algorithm. When writing a string-processing algorithm, test its speed! Try a few different approaches. You will quickly learn what is efficient and what is not. An Example That Shows How to Build Strings EfficientlyListing 8.9 has two similar string-processing algorithms that produce identical computational results. Both increment a counter and at each increment append the string representation of the counter to a growing piece of text. Both run the same amount of iterations, and both algorithms are of similar complexity to write. Yet one of the algorithms is vastly better performing than the other. Listing 8.9. Comparing String Usage to StringBuilder in AlgorithmsThis sample uses the PerformanceSampling class defined earlier in this book.
String concatenations and other modifications inside loops can be very expensive operations causing a great deal of object allocation and discarding. In contrast, the StringBuilder treats the data not as an immutable string but rather as a mutable array of characters and manages its size efficiently, growing the array size in sizable chunks when necessary. An immutable string object is allocated when the application requests one by calling the ToString() method. The StringBuilder class can be used to great effect in processing text.Loops that create new strings generate a great deal of garbage. Depending on the available amount of free memory, the garbage collector may run during the algorithm's execution if needed. It may run several times, and in memory-constrained situations it may run almost continually. This results in increasingly bad performance as memory pressure mounts. Even after exiting the algorithm, there will be a lot of garbage left around that needs to be cleaned up. Everything you allocate and discard eventually has to be looked at and cleaned up.Sometimes a physical mobile device can out perform an emulator on a much faster machine. In the string-allocation case above, we see that the physical Pocket PC actually outperformed my laptop running an emulator. However, in the StringBuilder case the Pocket PC did not perform as fast as the emulator. In both cases, the StringBuilder solution greatly outperformed the string-allocation method, but the ratios were not the same for the emulator and physical device tests. As a rule of thumb, when comparing algorithms what is faster on a PC is usually faster on a mobile device; if accurate performance assessment is important, however, it is always useful to test the performance on the physical hardware. |