Professional Excel Development [Electronic resources] : The Definitive Guide to Developing Applications Using Microsoft® Excel and VBA® نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Professional Excel Development [Electronic resources] : The Definitive Guide to Developing Applications Using Microsoft® Excel and VBA® - نسخه متنی

Stephen Bullen, Rob Bovey, John Green

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید











Data Processing Features


After we've arranged our data and/or calculations in a tabular form, we can use Excel's data processing features to manipulate it. Obviously, the actual manipulation that needs to be done will depend totally on the problem being solved, so this section describes some of the techniques available and how they can be linked together. It is up to the reader to decide which techniques are most appropriate for their situation!

It Doesn't Have to Be Data


Even though we've only considered data so far, Excel is not a database; it's designed to manipulate numbers. We are not forced to only include raw data in our structured ranges. Some of the most powerful number crunching comes from organizing our formulas in a structured, tabular form, then using Excel's data processing features on the results of those calculations. The only caveat with using formulas in our data tables is that the data processing (consolidation, pivot table, filter and so on) will not be updated or refreshed when the source formulas are recalculated. We can easily work around this by including some VBA code to trigger the processing from within the Worksheet_Calculate event (which conveniently occurs after the sheet has been calculated).

Pivot Caches


When Microsoft introduced pivot tables in Excel 95, they realized that it would be much more efficient to store the source data for the pivot tables in a hidden form within the workbook file than within worksheet cells. The hidden data stores are called pivot caches and are just like query tables, but without the visual representation and can only be used by pivot tables. They suffer from the same problem of including hard-coded paths to the database within their connection and SQL query information, and the same solution of using VBA to define the connection string and query text applies.

Pivot Tables


Pivot tables are Excel's premier data processing feature. Using either a table in a worksheet as the data source or a pivot cache for direct connection to a database, pivot tables enable us to filter, group, sort, total, count and drill down into our data. Most books about using Excel include a section explaining how to set up and manipulate pivot tables, so we're assuming you already know the basics. With pivot tables, there is only one level of difficulty, so once the basics are understood, the skill is in knowing how to most efficiently use pivot tables to analyze our data and integrate them with the rest of our data processing.

The best way to use pivot tables for data processing in most applications is to create all the pivot tables beforehand, on individual worksheets. If the pivot tables are set up to connect directly to a database, it is usually more efficient to have a single, large pivot cache that feeds multiple pivot tables, than having separate queries for each pivot table. The easiest way to do this is to create the first pivot table normally, then base subsequent pivot tables on the first (in step one of the Pivot Table Wizard). By using the same pivot cache, Excel will only need to store one copy of the data within the workbook and all the pivot tables will be updated when the cache is refreshed. Although it is possible to use the Excel object model to create and modify pivot tables, this should be kept to a minimum, because Excel refreshes and redraws the table with every modification. With anything other than a trivial amount of data, this quickly becomes extremely slow.

In the NWindOrders.xls example workbook, found on the CD in the \Concepts\Ch14Data Manipulation Techniques folder, the OrderData worksheet contains a query table which retrieves information about each order from the NorthWind sample Access database, shown in Figure 14-4.

Figure 14-4. The Query Table for NorthWind Order Details

[View full size image]

As well as retrieving the specific data for the order information, we've included extra information such as the company name, country and product category. Adding this extra information will usually have negligible impact on the query execution time or extra data storage requirements, but enables us to perform more diverse analysis using the same raw data. For example, Figure 14-5 shows the PivotTable worksheet from the example workbook, which includes both a breakdown of order quantities by country and product category and a list of our UK customers.

Figure 14-5. Two Diverse PivotTables, Derived from the Same Pivot Cache

[View full size image]

Unfortunately, this technique is limited by the lack of a "distinct count" function to total the data. The "count" function gives the number of records, which in our case is the total number of order detail line items. If we had a "distinct count" function, we would be able to identify the number of orders placed by each customer (by counting the number of distinct order IDs) or the number of customers in each country (by counting the number of distinct customer IDs).

Calculated Pivot Fields


Excel enables us to add extra fields and data to our pivot caches, in the form of calculated fields and calculated items. A calculated field is an extra column, derived from one or more other fields, such as defining a calculated Profit field as Revenue Cost, where Revenue and Cost are fields in the data set. These are of very limited use, because Excel always does the Sum of the individual fields before performing the calculation, so we get the following:

Sum of Profit = Sum of Revenue Sum of Cost


This is okay and marginally useful for the simple cases, but is useless and dangerous if a more complex formula is required. Looking at the NorthWind data in Figure 14-4, we have fields for the Quantity, UnitPrice and Discount, so we might be tempted to add a calculated Revenue field as Quantity x (UnitPrice Discount). Unfortunately, as Excel sums the individual fields before doing the calculation, we end up multiplying the total quantity sold by the sum of all the prices minus the sum of all the discounts! Unless that is what you really require, it is far better and much safer to add the additional fields at the raw data level, either by including calculated fields in the SQL query, or by adding extra columns alongside the query table, as shown in Figure 14-6.

Figure 14-6. Adding a Calculated Field Alongside a Query Table

When using columns alongside the query table, be sure to tick the Fill down formulas in columns adjacent to data check box in the query table Properties dialog to make sure the formulas are copied to new rows. We should also use a defined name to link the pivot table to the query table, which can be adjusted to ensure the pivot tables always refer to the correct data range, including the additional formulas. We create the name to refer to the full range of data and formulas and use that name instead of a direct range reference in Step 2 of the Pivot Table Wizard. The defined name can be updated using the QueryTable_AfterRefresh event shown in Listing 14-3, which also refreshes any pivot caches that use it.

Listing 14-3. Updating Defined Names and Refreshing Pivot Caches When a Query Table Is Refreshed



'Code contained within the OrderData worksheet code module
'Variable to hook the Query Table events
Private WithEvents mqtData As QueryTable
'Called from the start of Workbook_Open()
Public Sub Initialise()
'Set up the event hook for the query table
Set mqtData = Me.QueryTables(1)
End Sub
'Update dependent data when the QueryTable is refreshed
Private Sub mqtData_AfterRefresh(ByVal Success As Boolean)
Dim sRangeName As String
Dim pcCache As PivotCache
If Success Then
'Update the defined name
sRangeName = Me.Name & "!pdPivotDataRange"
mqtData.ResultRange.CurrentRegion.Name = sRangeName
'Refresh any dependent pivot caches
For Each pcCache In ThisWorkbook.PivotCaches
If pcCache.SourceData = sRangeName Then
pcCache.Refresh
End If
Next
End If
End Sub

Data Consolidation


Probably the most little-known of Excel's data processing features is its ability to consolidate numeric data from multiple ranges into a single table, matching the data using the labels in both the first row and first column of each range. If a single cell is selected, Excel first creates a unique list of all the column headers and a unique list of all the row headers (that is, the labels in the first column) to create the result table. If a range is already selected, Excel uses the row and column headers that are already there. It then adds (or counts, averages, max, min and so on) all the items of data that share the same row and column header.

This proves extremely useful when consolidating data and calculations that occur over a time series. For example, imagine a project to analyze whether to build and run a new theme park. You might have a workbook for the construction planning, another for the ongoing operations, another for concessions and retail planning and so forth. Each workbook contains a summary table showing the costs, revenue and cash flow for each year. A greatly simplified version is shown in Figure 14-7, but imagine the Construction Planning and Operations tables actually exist in different workbooks and each has been given a defined name.

Figure 14-7. Simplified Project Planning

To consolidate these ranges into a single table, select the top-left cell in the target consolidation area, A15 in this example, and click the Data > Consolidate menu to access the Consolidate dialog shown in Figure 14-8.

Figure 14-8. The Data Consolidation Dialog

The All references list shows all the source data ranges that will be consolidated. The ranges can be from the same worksheet, a different worksheet, different workbook or even a closed workbook! Yes, this is one of the few Excel features that works as well with closed workbooks as with open ones. To add a source data range, type the reference in the Reference: refedit and click the Add button. Make sure the two Use labels in check boxes are both ticked, to ensure Excel matches both row and column headers. If they're not ticked, Excel matches by position, which is rarely what is required.

When we click the OK button, Excel matches all the labels, adds up all the similar data and gives us the table shown in Figure 14-9.

Figure 14-9. The Consolidated Results

Advanced Filtering


The ability to extract specific records from a large data set is often the key to successful and efficient data processing. Pivot tables provide some rudimentary filtering capability, but only by hiding individual items of data. Excel's Advanced Filter feature enables us to filter the data using much more complex expressions, either just by hiding records in the original table, or more commonly by copying the resulting records to a new location for further processing. The Advanced Filter dialog is accessed by clicking the Data > Filter > Advanced Filter menu and is shown in Figure 14-10.

Figure 14-10. The Advanced Filter Dialog

As can be seen from Figure 14-10, when copying the filtered data to a new location, an advanced filter requires three ranges:

List range is the range containing the original data to be filtered.

Criteria range is a worksheet range used to define the criteria to use when filtering the data. Understanding how to get the most from the criteria range is the key to using advanced filtering and is the focus of the rest of this section.

Copy to is the destination range for the filtered data to be copied to.


When the OK button is clicked, Excel scans through the source data range, checks each record against the criteria specified in the criteria range and copies the matching records to the next row in the Copy to range. The result is a subset of the original data, arranged as a simple structured data areathat is, not as a list or query table.

Unfortunately, every time the Advanced Filter dialog is shown, the List range is either blanked out or guessed and the Action defaults to Filter in place. It would be much more helpful if Excel remembered the source range and action, which would be possible if only Excel created the filter as a query table. If that were the case, we would also have a one-click Refresh option and be able to tell Excel to automatically copy down adjacent formulas. The best we can do in current versions is to give our ranges some specific names. If the workbook contains the defined names Database, Criteria and/or Extract, Excel will populate the dialog using the ranges pointed to by those names.

To save you some frustration if you're working through these examples, we've included the routine shown in Listing 14-14 in the example workbook to refresh the filter without showing the Advanced Filter dialog. Note that in VBA, we use the AdvancedFilter method on the source data range and specify the range to copy the filtered data to.

Listing 14-4. Advanced Filtering with VBA



Private Sub cmdRefresh_Click()
Static rngCriteria As Range
Dim rngNewCriteria As Range
'Provide a default initial selection
If rngCriteria Is Nothing Then
Set rngCriteria = Me.Range("A1")
End If
'Use error trapping to handle a cancel
On Error GoTo ErrNoRangeSelected
'Allow the user to select the criteria range to use
'Type:=8 allows for selection of ranges.
Set rngNewCriteria = Application.InputBox( _
"Select the criteria range to use and click OK.", _
"Refresh Advanced Filter Extract", _
rngCriteria.Address, Type:=8)
'Remember the criteria range for next time
Set rngCriteria = rngNewCriteria
'Perform the autofilter
wksData.Range("pdPivotDataRange").AdvancedFilter _
xlFilterCopy, rngCriteria, _
Me.Range("rngAFExtract"), False
ErrNoRangeSelected:
Exit Sub
End Sub

Criteria Ranges


The criteria range is used to specify the equivalent of a SQL WHERE clause, telling Excel which records to return. Figure 14-11 shows an example of a criteria range, in A1:B3.

Figure 14-11. An Advanced Filter Criteria Range

The first row of the criteria range contains field names that must match the field names used in the source data table, but can be in any order. Subsequent rows contain the data to match for each field. All the items in a row are joined with an AND operation, while separate rows are joined with an OR operation. Blank cells in the criteria range match to anything. The criteria range shown in Figure 14-11 should be read as "(Country="UK") OR (Country="USA" AND CategoryName="Beverages")," so that will return all orders from the UK and all orders from the USA for Beverages. If we only want the Beverage orders from the UK or USA, we have to include the Beverages filter in both lines, as shown in Figure 14-12, which reads as "(Country="UK" AND CategoryName="Beverages") OR (Country="USA" AND CategoryName="Beverages")."

Figure 14-12. Beverages from the UK or USA

By combining the AND and OR logic in this way, Excel enables us to create extremely complex criteria.

We're not limited to filtering using "equals" relationships. In fact, the default filter for text items is "starts with" so the criteria range shown in Figure 14-12 will also return Beverages orders from the Ukraine! To specify an exact (case insensitive) match, we use an = sign, as shown in Figure 14-13. When typing these in, it's a good idea to start with a quote mark, '=UK, to tell Excel this is text and not a formula, or to format the criteria cells as text before typing the values.

Figure 14-13. Beverages from Only the UK or USA, and Not Ukraine

As well as using the = sign to specify an exact match, we can include the ? and x wildcard characters to match any one character or any range of characters respectively and use the > and < symbols to match ranges. To specify both a lower and upper limit, we can include the field name multiple times in the criteria range, such as the criteria shown in Figure 14-14 to select the orders from UK customers whose names start with G to N.

Figure 14-14. Specifying a Range of Matches by Repeating the Field Name

We can also, of course, filter on numeric and date fields in exactly the same way, although we have to be careful if our workbooks will be used in multiple countries with different date orders. For example, if you're American, you might expect the criteria range in Figure 14-15 to return all the records for 2004.

Figure 14-15. Filtering Between to Dates in the USA

When the filter is applied in the UK, it doesn't return any records, as 12/31/2004 is not recognized as a dateit should be 31/12/2004 instead. To avoid these issues, it is a very good idea to use formulas to construct the date criteria. In this example we should replace the hard-coded date in B2 with the formula ="<="&DATE(2004,12,31), which displays as the less-readable date number <=38352, but works in all locations. Similarly when filtering for a range of numbers, it is safest to create the criteria entry as a formula such as =">="&1.23, thereby allowing Excel to use the correct decimal separators for the location.

As well as specifying that individual fields must have certain values, we can also filter on relationships between the data in multiple fields. To do this, we use a dummy field name that doesn't exist in the source data, such as Calc1, Calc2 and so forth and create a formula using the cells from the first data row of the table (that is, not the header row). The formula must evaluate to TRUE or FALSE and must use relative referencing when referring to the data in the table. As Excel scans through the source table, it increments all the relative row references in the formula, evaluates the formula for that row and matches on a TRUE result. For example, the formula shown in Figure 14-16 will return any orders where the discount is more than 5 percent of the unit price. Note that this is entered as an Excel formula, not as a text string, so you should see the result of the formula (TRUE or FALSE) displayed in the cell.

Figure 14-16. Filtering Using a Formula

Instead of using cell references, which can be hard to read when the referenced range is on a separate sheet (as in this case), Excel enables us to use the field names in the formula, such as =Discount/UnitPrice>=0.05 in this case. Doing so usually results in the cell displaying a #NAME! error, but that can be safely ignored.


/ 225