Windows Component Technology
As you saw in Chapter 10 when we examined DLLs, one of the most important evolutionary steps within software development has been that of component architectures. Component-based development makes software production more manageable. By dividing an application into components, you can isolate issues and find problems much more quickly. Let's take a look at component technology from a historical perspective.
Some Component History
Chapter 20, dynamic linking uses a single copy of a library available on disk. The library is loaded on demand at run time by the client applications. If a copy of the library is already loaded, Windows simply maps the code pages into the client's memory space. At any rate, only one copy of the DLL remains loaded at run time.DLLs contain loadable, executable code and Windows resources. Remember from Chapter 20 that Windows supports two forms of dynamic linking: implicit linking and explicit linking. Implicit linking involves linking your client application to the DLL's import library. When the linker links to an import library, it inserts a little bit of fix-up code for each function exported by the DLL. When the client application finally runs, the first code that is executed by the application is the set of address fix-ups stipulated by the import DLL. By contrast, explicit linking involves a good deal more work for the client developer. Clients link to DLL entry points by explicitly calling LoadLibrary, FreeLibrary, and GetProcAddress.While DLLs provide the promise of real components—separate binaries that can be linked at run time—they don't quite make good on the promise.
What's Wrong with DLLs
One of the main challenges with plain-vanilla DLLs is keeping all the clients and DLLs synchronized as far as the exported functions are concerned. For applications that are developed and deployed atomically, this is not a problem. A great example is Windows itself, which is a set of atomically developed and deployed DLLs—user32.DLL, gdi32.dll, kernel32.dll, and so on. However, dynamic linking does pose problems for software whose modules are developed and deployed independently.
The promise of DLLs as a component technology lies in the ability it gives you to dynamically compose software—to change components at run time. As long as all the function signatures within the DLL remain the same as the ones expected by the client, there's no problem. However, if any of the method signatures change (perhaps with the addition of a parameter or the subtraction of a function) and the client application is not recompiled, the client application might not load (at the very least) or might crash (in the worst case). In another scenario, an older version of a popular library might be copied over a newer one, resulting in some sort of function mismatch between the client and the DLL.The basic problem is that the notion of typing is missing from the normal DLL loading process. Type signatures are contained in the header files shared between the client and the DLL, but they're found nowhere else. If DLLs and clients are compiled with the different header files, the application won't work. This is one of the advantages COM provides: adding formal type checking to the loading process.
The COM Technology
We looked at COM in detail in Chapter 22. By applying the discipline of interface-based programming, COM introduces a layer of indirection between the client and the actual component code. COM interfaces are collections of functions named by a GUID. The interfaces are predictable and don't vary in the same way that raw DLL entry points can. In fact, COM programming stipulates a rule that interfaces not change once they've been published. A normal Windows DLL might have a multitude of entry points, but a COM DLL has only four standard DLL entry points: DllGetClassObject, DllCanUnloadNow, DllRegisterServer, and DllUnregisterServer. The functionality of the DLL is described by one or more COM interfaces. COM turns DLL loading into a typed operation. Code is loaded based on type, and that type is an interface.For a more in-depth discussion of COM, see Chapter 22. For now, just recall these points:
COM interfaces are collections of function signatures, usually described in Microsoft Visual C++ as a struct. All COM interfaces include the same three function signatures at the top: QueryInterface, AddRef, and Release. These three functions comprise the IUnknown interface. Interfaces have unique names called GUIDs. Once an interface is published and used widely, it should never change.
COM implementations give life and behavior to these interfaces.
COM class objects, or class factories, expose COM implementations to the system. COM class objects are named using GUIDs and appear in the registry.
COM DLLs often include type information as a resource. This provides a level of reflection so clients of the DLL can understand what's inside the DLL.
COM clients use API functions to instantiate the COM object (CoCreateInstance, CoGetClassObject/IClassFactory::CreateObject, or CoCreateInstanceEx). Visual Basic clients simply need to use the New keyword. However, Visual Basic uses the API functions underneath the hood.
COM clients are responsible for managing the interface pointers they acquire. That is, they must call AddRef through an interface pointer when they duplicate the pointer, and they must call Release through an interface pointer when they discard the interface. Visual Basic developers don't need to pay attention to this rule because the runtime manages the interface pointers.
The Benefits of COM
COM is vastly superior to plain DLLs for composing software from components. In fact, many enterprises have built their core systems using COM. For example, the back end to Nasdaq.com is written using COM. COM works so well in so many cases for a number of reasons.One key to COM's success is its emphasis on interfaces. Decoupling clients from the implementations encourages component-based architectures. When your program accesses services through interfaces instead of classes, it's possible to change implementations easily without breaking the client. This allows separate parties to develop software independently of each other.COM loads services using named types. The name is the GUID, and the type is the interface definition. As you saw in Chapter 22, COM applications call CoCreateInstance, pass in the GUID representing the types—the interface ID and the class ID—and you get an instance of the class as well as a pointer to the interface. In addition, you can widen your connection at run time and get even more types by using QueryInterface. COM replaces the plain-vanilla LoadLibrary/GetProcAddress API calls with the single function CoCreateInstance and well-defined extensible interfaces to the code in the DLL. In a nutshell, COM introduced the notion of type into the DLL loading mechanism.In addition to enforcing interface-based programming, COM adds the notion of reflection—the DLL's ability to describe itself. Think about how standard DLL functionality is exposed: The only way you can learn about the contents of a plain-vanilla DLL is by reading some documentation or a header file. COM DLLs include binary type information embedded within the DLL. This type information advertises the types (data types and interface types) and implementations (class IDs) contained within. Visual Basic and Visual C++ use this type information to implement IntelliSense, and the COM runtime uses type information to set up the proxy-stub pairs at run time.
The Drawbacks of COM
Chapter 23, we looked at IDispatch and scriptable components, which allow you to target the Web with your software. Using IDispatch limits your data type selection to include only those types that fit in a VARIANT. So if you target your component for scripting, your available data types will decrease dramatically. Finally, the contents of a DLL's type library do not completely reflect the contents of the DLL in some cases.
Ultimately, COM-based applications are still built out of DLLs (often written using different development environments), and there will always be a boundary between the client and the object. That boundary is bridged using function signatures. COM adds the notion of type to the loader, adding some consistency and reliability to the loading process. However, COM supports disparate type systems and imposes some complex rules. These issues are spelling the end to COM's reign as the premier component technology for Windows. The goal of the common language runtime is to fix these issues.