Micro "Algorithm-Level" Memory Management Modern programming languages, class libraries and managed runtimes greatly enhance the productivity of writing code. However by abstracting the programmer from the need to think about the low-level memory allocations that their algorithms are making they also make it easy to write inefficient code. There are two kinds of inefficiencies that can occur when writing algorithms:Unnecessary computation inefficiencies These kinds of inefficiencies occur when the algorithm you have designed performs more calculations or loop iterations to produce a result that could have been achieved using a more efficient algorithm. The classic example is sorting arrays of data. Sometimes you may have a choice between sorting algorithms that may in special cases be "Order N" (meaning they take linearly more time to sort more items), "Order N*Log(N)" (meaning they do not scale linearly with more items but are still better than exponential) and algorithms that are "Order N^2" (meaning they take exponentially more time to sort more items). Infinitely many other orders of calculation exist (for example, "N^3"). The right algorithm to choose depends on the amount of data you are working with, the memory you have available, and other factors such as the state of the data you are working with. Specific strategies such as preprocessing data before sending it down to a device, or keeping the data in a specific in-memory relational format, may be able to increase algorithmic performance significantly. A great deal of classical computer science literature exists describing efficient algorithm design and measurement, and this book does not attempt to cover this topic. Suffice it to say that the more data you need to process, the more important your choice of computational algorithm is. When it is important, carefully consider your algorithm design and look at the research out there on the topic. Often many people have treaded the same ground already and you can leverage their learning.Unnecessary memory-allocation inefficiencies After you have chosen an algorithmic strategy, your code's performance will be drastically affected by how efficiently you implement that algorithm. More than anything, this involves trying to avoid unnecessary memory allocations and particularly allocations that take place inside loops. This section of the book focuses on this topic.You should have as a goal "zero memory allocations" inside loops you write. There are cases where this may be unavoidable, such as when building a tree of objects that requires the allocation of new nodes to place onto the tree. In many other cases, there are often great efficiencies that can be achieved by carefully examining the need for each object allocation and looking for alternatives. What is generally avoidable is the allocation and discarding of objects inside algorithmic loops.Write Environmental Algorithms: Don't Litter! Just like in the physical world, litter is garbage that is needlessly generated and discarded. Litter is usually the product of carelessness and sloppiness. It is the creating of a long-term problem based on short-term convenience. Creating unnecessary garbage in your algorithms in the form of temporary objects that are not really necessary slows your application down in two ways:Directly Every time you create an object memory needs to be allocated and initialized before it can be used. This is a direct and up-front cost that your algorithm will pay.Indirectly After the object is discarded, it becomes garbage. This garbage piles up in your application until there is so much of it that it must be cleaned up before more memory can be allocated for new objects. This is, of course, known as garbage collection. This garbage collection takes time, and if there is enough garbage piled up it will cause a noticeable stall in your application's operation. The more you litter, the quicker the garbage piles up and the more often time needs to be spent cleaning it up!In all of your programming, you should seek to create as little litter as possible. There is nothing wrong with instantiating an object if it is going to serve some indispensable need for you; if the object is short-lived and the task can be easily accomplished without the object, however, you are just creating litter. Don't be a litterbug.
"Value Types" and the .NET Framework For local variables inside functions, it is often efficient to use value types rather than objects when you want to encapsulate some simple data. A value type is just a convenient way to group together a bunch of related data rather than passing it around as individual variables.Value types are more lightweight than objects but can be "boxed" inside objects and passed around like them when needed. Value types can be useful and can aid performance (versus using objects), but because they look, and in many ways act like objects and can get "boxed" inside shell objects, be careful when you use them to make sure you are not accidentally introducing overhead and creating unneeded garbage. A good rule of thumb is to test your algorithms using both individual variables (for example, basic types such as int, string, double) and value types to compare the performance to make sure it is similar.Read the .NET Framework reference information for "value types" and for the C# "struct" for more information.An example of declaring a value type and declaring a class: //NOTE: IN VB.NET this would be a 'type' not a 'struct' //This is a value type struct MyRect_Type { public int x; public int y; } //This is a class class MyRect_Class { public int x; public int y; } //Some sample code class TestClass { public void foo() { //Must be allocated as an object MyRect_Class myRectClass = new MyRect_Class(); myRectClass.x = 1; myRectClass.y = 2; //Allocates a new object myRectClass = new MyRect_Class(); //Can be declared like a scalar type MyRect_Type myRectType; myRectType.x = 1; myRectType.y = 2; //Clears the values of the type but does not allocate an object! myRectType = new MyRect_Type(); } }
|
Write Environmental Algorithms: Reduce, Reuse, and Recycle Following is an example showing several different implementations of the same basic algorithm. The purpose of the algorithm is to process an array of strings. Each string in the array consists of three parts separated by an underscore character (for example, big_shaggy_dog). The algorithm is designed to look at each item in the array and find out if its center part is blue (for example, my_blue_car). If the center segment is blue, it will be replaced with orange (for example, my_blue_car becomes my_orange_car).Each of these algorithms also uses a helper class to make it easy to dissect the string and get to the data in each of the three segments. The first algorithm (Listing 8.3, Listing 8.4) shows a reasonable first approach, and the next two algorithms (Listing 8.5, Listing 8.6 and Listing 8.7, Listing 8.8) show optimizations made to improve on the original tactic. The optimizations are designed to directly improve the performance of the algorithm as well as to decrease the amount of litter each algorithm produces.Listing 8.2. Common Code Used in All Test Cases Below
//Number of times we want to repeat the test const int LOOP_SIZE = 8000; //---------------------------------------------------- //This function resets the contents of our test //array, so we can run the test algorithm //over and over //---------------------------------------------------- private void ResetTestArray(ref string [] testArray) { if (testArray == null) { testArray = new string[6]; } testArray[0] = "big_blue_duck"; testArray[1] = "small_yellow_horse"; testArray[2] = "wide_blue_cow"; testArray[3] = "tall_green_zepplin"; testArray[4] = "short_blue_train"; testArray[5] = "short_purple_dinosaur"; }
Listing 8.3. A Test Case Showing Wasteful Allocations (A Typical First Implementation of a Function) Note: This sample uses the PerformanceSampling class defined earlier in this book. private void button2_Click(object sender, System.EventArgs e) { //Run the garbage collector so we know we're starting //from a clean slate for our test. //ONLY CALL DURING TESTING! Calling the GC manually will slow //down overall application throughput! System.GC.Collect(); string [] testArray = null; //--------------------------------------------- //Go through the items in the array and //look for ones where the middle word is //'blue'. Replace the 'blue' with 'orange' //--------------------------------------------- //Start the stopwatch for our test! PerformanceSampling.StartSample(0, "WastefulWorkerClass"); WastefulWorkerClass workerClass1; int outerLoop; for (outerLoop = 0; outerLoop < LOOP_SIZE; outerLoop++) { //Set up the data in the array we want to do our test on ResetTestArray(ref testArray); int topIndex = testArray.Length - 1; for(int idx = 0; idx <= topIndex; idx++) { //------------------------------------ //Create an instance of a helper class //that dissects our string into three pieces // //This is wasteful! //------------------------------------ workerClass1 = new WastefulWorkerClass(testArray[idx]); //If the middle word is "blue", make it "orange" if(workerClass1.MiddleSegment == "blue") { //Replace the middle item workerClass1.MiddleSegment = "orange"; //Replace the word testArray[idx] = workerClass1.getWholeString(); } } //end inner for }//end outer for //Get the time we completed our test PerformanceSampling.StopSample(0); System.Windows.Forms.MessageBox.Show( PerformanceSampling.GetSampleDurationText(0)); }
Listing 8.4. The Worker Class for Our First Test Case
using System; public class WastefulWorkerClass { private string m_beginning_segment; public string BeginSegment { get {return m_beginning_segment;} set {m_beginning_segment = value;} } private string m_middle_segment; public string MiddleSegment { get{return m_middle_segment;} set{m_middle_segment = value;} } private string m_end_segment; public string EndSegment { get{return m_end_segment;} set{m_end_segment = value;} } public WastefulWorkerClass(string in_word) { int index_segment1; //Look for a "_" in the string index_segment1 = in_word.IndexOf("_",0); //If there is no "_", the first segment is the whole thing if(index_segment1 == -1) { m_beginning_segment = in_word; m_middle_segment = "; m_end_segment = "; return; } //If there is a "_", split it else { //If the "_" is the first char, the first segment is " if(index_segment1 == 0) { m_beginning_segment = "; } else { //The first segment m_beginning_segment = in_word.Substring(0, index_segment1); } //Find the 2nd "_" int index_segment2; index_segment2 = in_word.IndexOf("_", index_segment1 + 1); //2nd "_" does not exist if(index_segment2 == -1) { m_middle_segment = "; m_end_segment = in_word.Substring(index_segment1 + 1); return; } //Set the end segment m_middle_segment = in_word.Substring(index_segment1 + 1, index_segment2 - index_segment1 -1); m_end_segment = in_word.Substring(index_segment2 + 1); } } //Returns the whole 3 segments joined by "-"s public string getWholeString() { return m_beginning_segment + "_" + m_middle_segment + "_" + m_end_segment; } } //end class
Reuse Allocated Objects Whenever Possible Creating an instance of a helper class in each iteration of the processing loop is wasteful. Even though the object is relatively small because it has few data members, it still has overhead. In addition, each time a new object instance is created, an old object instance is discarded. This creates garbage that will have to be collected later. We can easily remove this wasteful object allocation.Listing 8.5. A Test Case Showing Slightly Reduced Object Allocations (A Typical Refinement of a First Function Implementation) This sample uses the PerformanceSampling class defined earlier in this book. private void button3_Click(object sender, System.EventArgs e) { //Run the garbage collector so we start from a "clean" //state for our test //ONLY CALL DURING TESTING! Calling the GC manually will slow //down overall application throughput! System.GC.Collect(); string [] testArray = null; //--------------------------------------------- //Go through the items in the array and //look for ones where the middle word is //'blue'. Replace the 'blue' with 'orange' //--------------------------------------------- //Start the stopwatch! PerformanceSampling.StartSample(1, "LessWasteful"); //------------------------------------------------ //LESS WASTEFUL: Allocate the object before we get in the //loop //------------------------------------------------ LessWastefulWorkerClass workerClass1; workerClass1 = new LessWastefulWorkerClass(); int outerLoop; for (outerLoop = 0; outerLoop < LOOP_SIZE; outerLoop++) { //Set up the data in the array we want to do our test on ResetTestArray(ref testArray); int topIndex = testArray.Length - 1; for(int idx = 0; idx <= topIndex; idx++) { //---------------------------------------------------- //Instead of reallocating the object, let's just reuse //it //---------------------------------------------------- //workerClass1 = new WastefulWorkerClass( // testArray[topIndex]); workerClass1.ReuseClass(testArray[idx]); //If the middle word is "blue", make it "orange" if(workerClass1.MiddleSegment == "blue") { //Replace the middle item workerClass1.MiddleSegment = "orange"; //Replace the word testArray[idx ] = workerClass1.getWholeString(); } } } //Stop the stopwatch! PerformanceSampling.StopSample(1); System.Windows.Forms.MessageBox.Show( PerformanceSampling.GetSampleDurationText(1)); }
Listing 8.6. The Worker Class for Our Second Test Case
using System; public class LessWastefulWorkerClass { private string m_beginning_segment; public string BeginSegment { get {return m_beginning_segment;} set {m_beginning_segment = value;} } private string m_middle_segment; public string MiddleSegment { get {return m_middle_segment;} set {m_middle_segment = value;} } private string m_end_segment; public string EndSegment { get {return m_end_segment;} set {m_end_segment = value;} } public void ReuseClass(string in_word) { //----------------------------------------- //To reuse the class, clear all the internal state //----------------------------------------- m_beginning_segment = "; m_middle_segment = "; m_end_segment = "; int index_segment1; //Look for a "_" in the string index_segment1 = in_word.IndexOf("_",0); //If there is no "_", the first segment is the whole thing if(index_segment1 == -1) { m_beginning_segment = in_word; return; } //If there is a "_", split it else { if(index_segment1 == 0) { } else { m_beginning_segment = in_word.Substring(0, index_segment1); } int index_segment2; index_segment2 = in_word.IndexOf("_", index_segment1 + 1); if(index_segment2 == -1) { m_end_segment = in_word.Substring(index_segment1 + 1); return; } //Set the end segment m_middle_segment = in_word.Substring(index_segment1 + 1, index_segment2 - index_segment1 -1); m_end_segment = in_word.Substring(index_segment2 + 1); } } public string getWholeString() { return m_beginning_segment + "_" + m_middle_segment + "_" + m_end_segment; } }
There Is Room for Further Improvements in the Preceding Code The approach above still demonstrates significant waste because we are continually allocating and discarding strings. If we were to code for maximum performance, we would look to eliminate the creation of any unnecessary string objects. |
Reduce Unnecessary Object Allocations Notice that in the algorithm above, we are often computing a lot of values that we do not need. Specifically, we are generating at least three new strings in each iteration of the loop, one for the first segment of the original sting, one for the middle segment, and one for the end segment. For example, the string big_blue_boat is dissected into three different strings: big, blue, boat. These strings take time to create and will need to be cleaned up later as well.The algorithm checks only the middle segment of a string to see whether it matches a specific value (blue). As long as the value does not match, no other processing is needed. This means that most of the time we are wastefully allocating strings for the beginning and end segments even though these are only used by the algorithm if it puts the whole string back together again from the pieces. What if instead of creating new strings out of pieces of the old string in each iteration of the loop we just store the character index values that tell us where the underscores (_) are in the string? We can store this data in integer values, which should be much less expensive than allocating new strings. If we do this, we can use the original string and the index values to do a string comparison starting at the first underscore and going to the second underscore (for example, _blue_ in the example above). Only in the case where we find a match will we need to do any additional string creation to replace the middle segment. In most cases, we will be much better off and will not need to do any object or string allocation. Only in the case where we find a match in the middle segment will we need to do additional string operations, and in any case we will only need to do as many as we were going to do previously anyway. In no case should we be worse off.Listing 8.7. A Test Case Showing Significantly Reduced Object Allocations (Typical of Doing Significant Algorithm Optimizations on the First Implementation) This sample uses the PerformanceSampling class defined earlier in this book. private void button5_Click(object sender, System.EventArgs e) { //Run the garbage collector, so we start from a //clean slate for our test //ONLY CALL DURING TESTING! Calling the GC manually will slow //down overall application throughput! System.GC.Collect(); string [] testArray = null; //--------------------------------------------- //Go through the items in the array and //look for ones where the middle word is //'blue'. Replace the 'blue' with 'orange' //--------------------------------------------- //Start the stopwatch for the test PerformanceSampling.StartSample(2, "DeferredObjects"); //------------------------------------------------ //LESS WASTEFUL: Allocate the object before we get in the //loop //------------------------------------------------ LessAllocationsWorkerClass workerClass1; workerClass1 = new LessAllocationsWorkerClass(); int outerLoop; for (outerLoop = 0; outerLoop < LOOP_SIZE; outerLoop++) { //Set up the data in the array we want to do our test on ResetTestArray(ref testArray); int topIndex = testArray.Length - 1; for(int idx = 0; idx <= topIndex; idx++) { //------------------------------------------------------ //Less Wasteful: //Instead of reallocating the object, let's just reuse it //Also: The implementation does NOT create additional //strings //------------------------------------------------------ //workerClass1 = new WastefulWorkerClass( // testArray[topIndex]); workerClass1.ReuseClass(testArray[idx]); //If the middle word is "blue", make it "orange" //-------------------------------------------------- //Less Wasteful: //This compare does not need to create any additional //strings //-------------------------------------------------- if(workerClass1.CompareMiddleSegment("blue") == 0) { //Replace the middle item workerClass1.MiddleSegment = "orange"; //Replace the word testArray[idx ] = workerClass1.getWholeString(); } } } //Stop the stopwatch! PerformanceSampling.StopSample(2); System.Windows.Forms.MessageBox.Show( PerformanceSampling.GetSampleDurationText(2)); }
Listing 8.8. The Worker Class for Our Third Test Case
using System; public class LessAllocationsWorkerClass { public string MiddleSegment { set { m_middleSegmentNew = value; } } private string m_middleSegmentNew; private int m_index_1st_undscore; private int m_index_2nd_undscore; private string m_stringIn; public void ReuseClass(string in_word) { //----------------------------------------- //To reuse the class, clear all the internal state //----------------------------------------- m_index_1st_undscore = -1; m_index_2nd_undscore = -1; m_middleSegmentNew = null; m_stringIn = in_word; //This does not make a string copy //Look for a "_" in the string m_index_1st_undscore = in_word.IndexOf("_",0); //If there is no "_", the first segment is the whole thing if(m_index_1st_undscore == -1) { return; } //Look for the 2nd "_"; m_index_2nd_undscore = in_word.IndexOf("_", m_index_1st_undscore + 1); } public int CompareMiddleSegment(string compareTo) { //If there is no 2nd underscore, there is no middle if(m_index_2nd_undscore < 0) { //If we are comparing to an empty string, then this is a //match if((compareTo == null) || (compareTo == ")) {return 0;} return -1; } //Compare the middle segment to the 1st and second segments return System.String.Compare(m_stringIn, m_index_1st_undscore + 1, compareTo, 0, m_index_2nd_undscore - m_index_1st_undscore -1); } public string getWholeString() { //If we've been given no new middle segment, return the //original if (m_middleSegmentNew == null) { return m_stringIn; } //Build the return string return m_stringIn.Substring(0,m_index_1st_undscore + 1) + m_middleSegmentNew + m_stringIn.Substring( m_index_2nd_undscore, m_stringIn.Length - m_index_2nd_undscore); } }
Analysis of the Successive Optimizations Done Above There are usually significant savings to be achieved in tuning your application's algorithms. The most important thing to do is to measure the performance gains your tuning is intended to accomplish. Some performance "improvements" you make might inadvertently hurt performance because of memory allocations you were not aware of or perhaps for other reasons. In developing the code example above, I had several situations where I thought changes might improve results, but upon measurement and deeper consideration learned that the changes were detrimental. Always measure and compare your performance!It is also important to test on the actual hardware you intend to run on to see what the "on-device" experience is. Using an emulator can be very useful for code design and basic tuning, but the results may differ on the physical hardware due to processor differences, memory differences, and other factors. Some code may run faster on physical devices and some may run slower. The final proof is always on the actual hardware end users will use. For the same reason, it is important to run code with and without the debugger attached to measure performance. An attached debugger can dramatically slow down algorithm performance in some cases.The results of the three different algorithms tested above were as follows.Table 8.1. Test Case Results for 8,000 Loop Iterations Running in Pocket PC Emulator (Results in Seconds) Trial # | Wasteful Allocations | Slightly Reduced Allocations | Significantly Reduced Allocations |
---|
1 | 12.65 | 12.2 | 8.925 | 2 | 12.775 | 12.35 | 8.55 | 3 | 12.575 | 12.25 | 8.225 | 4 | 12.625 | 12.525 | 8.575 | Average | 12.65625 | 12.33125 | 8.56875 |
---|
% time savings vs. test baseline | 0% | 2.57% | 32.30% |
Table 8.2. Test Case Results for 2,000 Loop Iterations Running on Physical Pocket PC Device (Results in Seconds) Trial # | Wasteful Allocations | Slightly Reduced Allocations | Significantly Reduced Allocations |
---|
1 | 30.609 | 30.151 | 20.484 | 2 | 30.538 | 30.016 | 20.362 | 3 | 30.517 | 30.195 | 20.377 | 4 | 30.457 | 30.316 | 20.429 | Average | 30.53025 | 30.1695 | 20.413 |
---|
% Time savings vs. test baseline | 0% | 1.18% | 33.14% | Analysis of test results above:The first optimization, reusing an object instead of reallocating it, improved performance only marginally. Probably this is because the object itself was small in size and held little data. In retrospect, this may not be too surprising. Given that this was an easy optimization to make involving only a few lines of code, it is worth keeping. An additional benefit not reflected in the numbers above is that by removing an object allocation from a loop with high iterations we should be reducing the "object littering" of our application considerably. This should result in the garbage collector needing to run less often and give our application better overall performance. The optimization did not have a great direct impact on the execution speed but it did no harm and reduced our littering considerably.The second optimization we did, removing several string object allocations and replacing them with string indexes instead, had a significant impact on performance. The algorithm's overall performance was directly improved by more than one third by making a few changes to how our helper class was designed and deferring any string object creation until absolutely necessary. Not only does this result in direct performance improvements, but as above we have greatly reduced the amount of garbage our algorithm produces. This will mean fewer overall garbage collections and a smoother and faster running application.The ratios of performance improvement were reasonably similar for emulator and physical Pocket PC devices, but the absolute performance differed drastically for the algorithm. In this case (string allocations), optimizations produced similar improvements on the emulator and on physical devices, but the best performing algorithm ran at about 934 iterations/second on the emulator and at 122 iterations/second on the physical device. This made the emulator about 7.6 times faster than a physical device for this kind of algorithm. If this is a critical algorithm and we are going to run it on thousands of pieces of data, we need to make sure the on-device user experience is good. This may require us to use background threads to do data processing or may require us to work with smaller sets of data at any given time. The only way to get real-world measurements is to run on real-world devices with real-world data.
|