Advanced FunctionsThe Database FunctionsWe often see advanced filtering used to select a subset of data, with the result of the filter being used by a few simple worksheet functions, such as SUM, AVERAGE and so on. Depending on the complexity and number of the worksheet functions that refer to the filtered data, it can often be quicker and easier to use Excel's database functions. These are equivalent to the normal SUM, AVERAGE, MIN, MAX, COUNT, COUNTA and so forth, but instead of providing a simple range to operate over, we provide a source database, a criteria range to filter the database by and the field in the database to operate on. For example, while AVERAGE(K2:K2156) would give us the overall average discount in our sample workbook, we could use the DAVERAGE() function to calculate the average discount of our UK and USA Beverage sales, ignoring those with zero discount. The criteria range for the database functions follows exactly the same structure and rules as for advanced filtering, so we could use the range shown in Figure 14-17 in this case. Figure 14-17. Criteria Range for Discounted UK and USA Beverage Sales The average nonzero discount for our UK and USA Beverage sales could then be calculated using the worksheet formula:If we use these functions within the advanced filter criteria range, we can perform some extremely powerful filtering. For example, the criteria range shown in formula view in Figure 14-18 will extract all the UK or USA Beverages sales that have a discount greater than the average discount for UK or USA Beverage sales, ignoring those sales where no discount was applied. Figure 14-18. Using a Database Function Within an Advanced Filter Criteria Range[View full size image] Here, the first three columns of the criteria range are being used by the DAVERAGE() function to calculate the average discount. The average discount figure is then used to populate the fourth column in the criteria range, which is used by the Advanced Filter.Array FormulasThe standard worksheet functions that we use every day typically accept one or more parameters and return a result. A few, such as SUM() and AVERAGE(), accept ranges or arrays in their parameters and will return the sum, average and so on of all the data they're given. Most worksheet functions and mathematical operators, however, are given a single number for each of their parameters and return a single number as the result.Even though a function normally accepts single-figure parameters, we can usually give it a multicell range reference and enter the function using Ctrl+Shift+Enter instead of just pressing the Enter key. Doing this tells Excel to calculate the function as an array formula, whereby the function performs its calculation multiple times, iterating over each cell in the range. The result is an array of numbers, with each element in the array corresponding to one of the cells in the original reference. These results can in turn be fed into the parameters of other functions and so on until they are eventually aggregated (usually summed) to give a final answer. All of the array calculation is done inside Excel and does not usually appear on the worksheet.The most common use of array formulas is to count and sum lists, using multiple criteria. Excel provides the COUNTIF() and SUMIF() functions that accept a single filter criteria, so we could sum our UK orders using the following function: where column E contains the country names and column I contains the order quantities. If we want the total of our UK Beverages sales, we can no longer use SUMIF, because we need two criteria. We could create a pivot table for it, or use DSUM with a criteria range, but both of those can be overkill if we only have a relatively small list and simple criteria, and cannot be used if we don't have column headers.Array formulas occupy the middle ground between the simplicity of a worksheet function and the complexity of criteria ranges. If we only have a series of conditions AND'ed together, we can use an array formula of the form: To get the total orders of UK Beverages from our example data, we could use the following formula: Remember to enter it using Ctrl+Shift+Enter. Let's look at the sample data shown in Figure 14-19 (on the following page) to see how it works. Figure 14-19. Sample Data for an Array Formula[View full size image] To explain how the array formula works, we need to break it up and explain each part of the formula, starting from the middle and working outward:$E$2:$E$2156="UK" Excel scans through each of the cells in the range E2:E2156 in turn, checking whether each one is equal to UK. The result is an array of True or False values. In our case, it is the array { F, F, T, T, T, T, T, T, F, F}.N($E$2:$E$2156="UK") The N() function converts its parameter to a number. When given an array of True and False values, it converts each True to 1 and each False to 0. In our case, this is the array { 0, 0, 1, 1, 1, 1, 1, 1, 0, 0} . You might see a double-minus being used instead of the N() function, such as --($E$2:$E$2156="UK"), which has the same effect and is preferred by some people. You might also see the N() function omitted from complex array formulas, as Excel will often (but not always) do the conversion without being told.$G$2:$G$2156="Beverages" Like the test for UK, Excel scans each cell in the range G2:G2156, checking whether each one is equal to Beverages. In our case, the result is the array { T, F, T, F, F, F, F, T, F, F}.N($G$2:$G$2156="Beverages") Converts the Beverages True/False array to 1s and 0s, giving the array { 1, 0, 1, 0, 0, 0, 0, 1, 0, 0}.$I$2:$I$2156 A standard range reference, which is directly translated into the array { 21, 15, 25, 25, 15, 20, 20, 25, 2, 20}.SUM($I$2:$I$2156*N($E$2:$E$2156="UK")*N($G$2:$G$2156="Beverages")) Multiplies the matching elements from each of the intermediate arrays and totals the result, as shown in Figure 14-20. Figure 14-20. The Inner Workings of an Array Formula[View full size image] ![]() For these situations, the decision to use an array formula instead of a pivot table, advanced filter or database function is largely dependent on the size of the data set and the number of such formulas required. For one or two totals, array formulas are often the most efficient, but as the number of totals increases, it becomes more efficient to perform the filtering before calculating them.After you've grasped the concept of array formulas, you will probably identify more and more situations where they can be used. A common requirement for many array formulas is to be able to generate a number sequence such as the array { 1, 2, 3, 4, 5} . This can be achieved using the awkward-looking formula =ROW(INDIRECT("A1:A5")). The INDIRECT("A1:A5") part returns the range reference A1:A5, and is insensitive to rows being moved, added or deleted. The ROW() part returns an array of the row number of each row in the range, being the array of rows 1 to 5, { 1, 2, 3, 4, 5}. The classic use of such a sequence is in the "sum of digits" calculation often used in credit card checksum formulas. Given an arbitrary number, 672435, what is the sum of each of the digits in the number. In this case it's 6+7+2+4+3+5=27. To calculate it using a formula, we start off with a sequence from 1 to the number of digits, use the sequence in the MID() function to extract each digit in turn (as text), convert it to a number and then sum the resultant array. The complete function is as follows: Where B7 contains the number for which we want to calculate the sum of the digits. To understand how it works, let's break it down again:LEN(B7) gives the length of the number (that is, the count of its digits; 6 in our case).INDIRECT("A1:A"&LEN(B7)) returns the range A1:A6 in our case.ROW(INDIRECT("A1:A"&LEN(B7))) returns the row of each cell in the range, giving the array { 1, 2, 3, 4, 5, 6}.MID(B7,ROW(INDIRECT("A1:A"&LEN(B7))),1) applies the sequence to the startnum parameter of the MID() function, which returns the nth digit from the number as text. In this case, it's the array { "6", "7", "2", "4", "3", "5"}.VALUE(MID(B7,ROW(INDIRECT("A1:A"&LEN(B7))),1)) converts the array of text items to numbers, giving the array { 6, 7, 2, 4, 3, 5}.SUM(VALUE(MID(B7,ROW(INDIRECT("A1:A"&LEN(B7))),1))) sums the numbers in the array, giving 6+7+2+4+3+5=27. Despite their definite power, array formulas have three main problems: They're relatively slow to calculate, particularly when operating on large data sets; they're relatively difficult to understand, when compared to normal worksheet functions; and they're difficult to test, debug and maintain. If you're using Excel 2002 or later, the Tools > Auditing > Evaluate Formula feature can be very useful for analyzing and debugging array formulas.Our advice is to use array formulas when absolutely necessary, but don't use them just to save a few cells. It is often quicker to create and much easier to understand if intermediate cells are used for extra calculations, instead of trying to perform everything in a single array formula. Circular ReferencesExcel's online help file and most books mention circular references in terms of "circular reference errors," where you've accidentally created a circular reference by mistyping a range reference. This is, indeed, one of many potential symptoms of spreadsheet errors, which can be quite difficult to track down. If you find yourself in that situation, the findcirc.xla add-in, available for download from [ www.oaltd.co.uk/Excel ], might come in useful. This add-in scans a workbook, trying to locate a circular reference chain and provides the full list of cells involved in the circle. With any luck, you should be able to identify the erroneous references and break the chain.Much more interesting, though, is the intentional use of circular references to tidily solve business problems. A great many problems in the world of finance are circular in nature. A typical example is to determine the repayments of a long-term loan. A company may have decided to devote 40 percent of their after-tax profits to repay a loan. The problem is that both the loan repayment and the interest charge can usually be offset against the tax liability, thereby increasing the after-tax profits and allowing the company to repay more of the loan. The problem can be expressed using the following equation:R = ( P R ( B R ) x I ) x ( 1 T ) x 0.4 where R is the amount of the loan to repay, P is the profit before financing and tax, B is the balance of the loan, I is the interest rate and T is the tax rate. In this extremely simple example, it is possible to solve for R algebraically, giving the following:R = (P I x B) / ( 1 I + 1 / 0.4 / ( 1 T ) ) In most real-life examples, however, the interest rate may be stepped depending on the outstanding balance, and the tax calculation is unlikely to be as simple as just multiplying by the tax rate. In these situations, we can intentionally use circular references to iterate to a solution. Figure 14-21 shows a worksheet to solve this simple problem using circular references. Figure 14-21. Using Circular References to Calculate Loan Repayments When we created the sheet, we initially put a guessed value in cell B12. After entering the remaining formulas, we added the forward reference in B12, to refer to B17. By default, Excel disables the calculation of circular references. To enable them, put a tick in the Tools > Options > Calculation > Iteration box. The Max Iterations and Max Change settings can be left as their defaults; they have little impact on most circular-reference problems. They come into play if the calculations are particularly slow at converging to a result. The Max Change determines when Excel considers a circular reference to have converged correctly (the new result must be within the given value of the previous iteration), whereas the Max Iterations provides a cut-off point to tell Excel to stop trying. In slowly converging calculations, the Max Iterations may need to be increased to allow the iterations to run until completion. Such situations should be examined to see whether the calculations can be reworked to give a solution that converges within fewer iterations.The worksheet shown in Figure 14-21 adopts a number of best practices when designing worksheets in general, and specifically when using circular references:The title of the worksheet makes it clear that intentional circular references exist on the sheet.The input ranges are clearly identified, with a light-colored background.Each formula is clearly identified with a label stating what is being calculated.All the formulas except the circular reference refer to cells above them; the cell containing the circular reference is the only one with a reference to a cell below it.The circular reference is clearly identified by including (circ) in the cell label.Both ends of the circular reference have the same label.The circular reference in cell B12 refers to the single cell holding the value to be fed back into the circular calculation, and only that cell.After you've used circular references for a while, you'll notice two common issues. First, if any of the functions within the circle results in an error value, it will propagate to every function in the circle. Second, the ability of the formulas to iterate to a solution can be quite sensitive to the initial guess for the feedback value. Both of these issues can be resolved by including a kill switch to control whether the circular reference is calculated and an extra cell to provide a seed value for the initial guess. When the kill switch is FALSE, the feedback cell(s) take on the seed value, which should also clear out any residual error values. When the kill switch is TRUE, the feedback cell(s) complete the circle. Figure 14-22 shows the same loan repayment problem with the addition of a kill switch in cell B4 and all other changes highlighted in bold. Figure 14-22. Using a Kill Switch to Control the Circular Reference Feedback Unfortunately, including circular references in our worksheets prevents us from using some of Excel's features. Specifically, the Goal Seek, Data Table and Solver features will only calculate a single iteration of the sheet for each step in their processing, so will never return correct results. |
The average nonzero discount for our UK and USA Beverage sales could then be calculated using the worksheet formula:
Here, the first three columns of the criteria range are being used by the DAVERAGE() function to calculate the average discount. The average discount figure is then used to populate the fourth column in the criteria range, which is used by the Advanced Filter.
To explain how the array formula works, we need to break it up and explain each part of the formula, starting from the middle and working outward:$E$2:$E$2156="UK" 
When we created the sheet, we initially put a guessed value in cell B12. After entering the remaining formulas, we added the forward reference in B12, to refer to B17.
Unfortunately, including circular references in our worksheets prevents us from using some of Excel's features. Specifically, the Goal Seek, Data Table and Solver features will only calculate a single iteration of the sheet for each step in their processing, so will never return correct results.