Macro-OptimizationThe vast majority of the performance improvement will come from restructuring the code to use a more efficient algorithm. This part of the chapter highlights some of the things to look for and provides alternative suggestions for doing the same thing. Whether the suggestions shown here are better or worse than your existing code will depend very much on the situation, particularly on the amount and type of data being processed.The slowest parts of a procedure will invariably involve either external data retrieval or repeatedly looping through sets of data. Large loops are an opportunity for optimization; any improvement (however minor) that can be made inside a loop is a gain many times over. The performance of external data retrieval is usually dependant on the server and network performance and is something we have little control over. One thing we can do to minimize the effect of poor database performance is to load data (such as static lookup lists and so forth) when the application starts instead of when it is required. Users often find it more acceptable to have a longer startup time than sluggish performance after the application is loaded. PreprocessBefore reading the rest of this paragraph, start Excel and write the fastest possible VBA routine you can to calculate how many 1s there are in the binary representation of the numbers 0 to 255. How did you do it? Did you use the Dec2Bin function from the Analysis Toolpak? Did you use a recursive routine? Did you repeatedly divide by 2 and check whether the result was odd or even? Or did you work out the numbers yourself, hard-code the results in a VBA array and just read them at runtime, as shown in Listing 17-4? Listing 17-4. How Many 1s Are There in a Binary Number?By doing as much processing as possible when developing the application, our applications don't need to do the processing at runtime. Check the OrderThe best routines are those whose performance doesn't vary significantly with the volume of data being processed. For example, a routine that copied a set of data from a Variant array into a worksheet range, calculated the worksheet and returned a result would take approximately the same time whether there were ten or a thousand elements in the array. Such routines are said to have an order of 1 and are very hard to achieve in practice.The next best are those that vary linearly with the volume of data, such as one or more sequential For...Next loops through the data. These routines have an order of n, so if we have ten times as much data, the routine is likely to take approximately ten times as long. With a little thought and work, most routines can be reduced to order n.Nested loops result in routines that are very sensitive to the volume of data being processed. A routine with two nested loops has an order n2 and each extra level of nesting adds an extra order to the routine. If these routines are given ten times as much data to process, they're likely to take 100 or 1,000 times as long. If such a routine normally takes 1 second to complete, it might take 15 minutes to process 10 times as much data. In most cases, nested loops are just a quick and easy way to code an algorithm that could be redesigned as multiple sequential loops through the data. Note that nested loops will often be spread over many different procedures, for example where ProcedureA loops through an array and calls ProcedureB for each element, which itself loops through another array to process the element.As an example, consider the routine in Listing 17-5, which compares two arrays and processes any items that are in both. Listing 17-5. Compare Two ArraysWithout thinking too hard about how to improve this routine, we might be tempted to just add an Exit For to jump out of the inner loop after we've found a match, but that still leaves the routine essentially of order n2. If the two arrays are sorted, we can reduce this to order n by looping through both arrays within the same loop, as shown in Listing 17-6. Listing 17-6. Process Both Arrays Within One LoopIf the arrays are not sorted, it will probably be quicker to sort them both beforehand, then use the above routine. If the output has to be in a specific order (preventing us from sorting both arrays), we could sort asArray2 and use a binary search routine to see whether the string exists, or use a Dictionary object (see later for an example of each). Tighten the LoopHaving replaced the nested loops with more efficient algorithms, the next task is to make the code within the remaining loop as tight as possible. As well as implementing all the micro-optimizations shown later in this chapter, the primary goal is to ensure that in each iteration, we only execute the minimum amount of code possible. Returning to the question above "Does B have to follow A?" it is common to see loops that contain code to calculate intermediate results, followed by some tests to check whether the intermediate result should be used (because this reflects the order in which we originally thought about the routine). If we turn the routine on its head, we can do the tests first and only calculate the intermediate results for those elements we know we'll be using. Fast VBA AlgorithmsQuickSortThe QuickSort routine is one of the fastest sorting algorithms and should be used whenever you want to sort an array. It works by doing the following:Select one element from the array, typically taken from the middleScan through the array moving everything that should come before the selected element to the bottom of the array and everything that should come after the selected element to the top of the arrayCall itself to sort the bottom halfCall itself to sort the top halfFor best performance, you should have a number of QuickSort routines for specific data types, such as that shown in Listing 17-7 for one-dimensional string arrays. Listing 17-7. A QuickSort Routine for One-Dimensional String Arrays
Binary SearchA binary search is a very quick way to locate an item within a sorted array. It works by doing the following:Compare the item to look for with the element in the middle of the array.If they match, we found it.If the item to look for is lower than the middle of the array, throw away the top half.If the item to look for is higher than the middle of the array, throw away the bottom half.Repeat 15, cutting the array in half each time until we find the item or run out of array.A binary search routine, such as that shown in Listing 17-8, is practically insensitive to the size of the array passed in, as doubling the size of the array results in only one extra iteration of the routine. Listing 17-8. A Binary Search Algorithm
Sort and ScanThe combination of a QuickSort and BinarySearch gives us a very efficient way of comparing two arrays, to process the elements in common, as shown in Listing 17-9. Listing 17-9. Combining a Sort and Binary SearchThis is not quite as efficient as the example shown previously that relied on both lists being sorted, but is a very efficient and easy to understand alternative for use when the initial array must be left in its original order. The SORTSEARCH_INDEX udtWhen dealing with large 2D arrays, arrays of objects or multiple keys, it is usually more efficient to create a new indexing array and sort and search that than to try to sort and search the original array. An index array is an array of the SORTSEARCH_INDEX udt, which is defined as follows: The key is the string used for sorting and searching, which is typically the value from the first column in a large 2D array, the name of an object or a concatenation of the values to sort an array by multiple columns. The index is the row number in the original array. When the udt array is sorted, we can loop through the elements in the sorted order, using the Index property to identify the appropriate row from the original array, as in Listing 17-10. Listing 17-10. Using the SORTSEARCH_INDEX udtQuickSortIndex is a version of the QuickSort algorithm for arrays of the SORTSEARCH_INDEX user defined type and can be found on the CD in the workbook \Concepts\Ch17Optimizing VBA Performance\Algorithms.xls. The workbook also contains a version of the binary search algorithm, BinarySearchIndex, which searches for a string in the index array and returns the row number in the original array. ![]() |