In our experience, people sometimes have one of two extreme reactions upon learning about directories. One reaction is negative: "What good is this directory really going to do? Why can't I just use X to fill that same need? I don't see how this directory is going to be as reliable and perform as well as you say, and I am hesitant to make my application depend on it!" This reaction can usually be overcome by additional education, helping the skeptic see a directory in action, and better explaining all the great things a directory can do.
The other reaction is equally misguided, but it can be more difficult to overcome: "This directory is great! I bet there's nothing it can't do. I can now use the directory instead of my file system, Web server, FTP server, and a host of other things. Finally, a handy, all-in-one tool that chops, slices, dices, makes homemade desserts, but best of all, cleans itself!" The enthusiasm of this reaction is to be admired, but there is danger here too. An old saying states that if the only tool you have is a hammer, every problem begins to look like a nail. Needless to say, not all problems are nails, and trying to treat them as such leads to poor carpentry.
A directory is no different; like any other technology, it is best suited to solving one set of problems. A directory can be usefully applied to many other problems as well, though with somewhat less satisfaction. You can abuse a directory by trying to make it solve problems that it was not intended to solve.
In this section we examine this area more closely, explaining some of the many applications for which directories are
not well suited. We will explain why directories are different from the following network services that may look similar on the surface:
Databases
File systems
Web servers
FTP servers
DNS servers
We will explain how your directory can complement all these services and why each one fills a valuable niche in your computing infrastructure. We will conclude this section with a summary of when to use a directory to store information and when to use something else.
In the first section of this chapter we described what a directory is, and we compared directories to databases. We will not repeat that comparison here, but we will highlight a few important points. First, recall that directories are typically read-focused rather than write-focused. So if your application is writing large volumes of datasay, for recording merchandise transactionsyou should choose a database over a directory.
Second, directories support a relatively simple transaction model. Directory transactions involve only a single operation and a single directory entry. Databases, on the other hand, are designed to handle large and diverse transactions, spanning multiple data items and many operations. Databases also support join operations that allow complex result sets to be produced efficiently. If your application requires this kind of support, again a database is a better answer than a directory. On the other hand, if you have an application that does not have these requirements but instead wants to read and occasionally write information to a network-accessible database in simple transactions, a directory server may well be a good, cost-effective, much simpler choice than something like an Oracle relational database.
A directory makes a poor file system. Files have characteristics different from directory information. Files are often large, containing many megabytes or even gigabytes of information, whereas directories are optimized for storing and retrieving relatively small pieces of information.
Although there is often no restriction on the size of a directory entry or attribute, the question is one of design center. Some files are written far more often than they are read, such as log files and files used to hold a database. Directories, as you'll recall, are optimized for read over write. Some files, of course, are read far more often than they are written. Application binaries are a good example of files in this category, but the size of these files is often larger than should be stored in a directory.
Applications often access a file in chunks, especially if the file is large. File system APIs provide functions for this very purpose, such as seek(), read(), and write(), which can be used to access only a portion of a larger file. Directories do not provide support for this kind of random access. Instead, a directory entry is split up into data items called
attributes for example, telephone number and name. You can retrieve each attribute separately, but you usually have to retrieve all of any attribute you ask for. Unlike a file system, there is no way to retrieve only part of an attribute, starting at a particular byte offset. File systems, on the other hand, are not good at storing attribute-based information, and they do not typically have the general-purpose search capabilities that directories have.
Since the World Wide Web burst onto the computing scene in the mid-1990s, Web servers have become ubiquitous. Chances are good that your organization, depending on its size, runs anywhere from one to several hundred Web servers. Web servers have certain similarities to file systems. They are made to serve clients accessing documents or files that can vary greatly in size. Although Web server documents usually share the typical "read many, write few" characteristic of directory information, directories are not well suited to the task of delivering to clients multimegabyte JPEG images or Java applications.
Web servers also often serve as a springboard to a development platform for Web applications. These platforms range from simple CGI (Common Gateway Interface) to more complex platforms, such as the one found in BEA's WebLogic Application Server or Sun's ONE Application Server. Directories typically do not provide this kind of flexibility in application development, although some directory implementations, such as the Netscape Directory Server, do provide a platform and a set of services that can be used for directory application development.
Directories are optimized for providing sophisticated searching of the data they hold. Web servers can be used to develop similar search applications, but such applications are not based on standards. Web servers are tuned to providing GUI-style interfaces on applications; they are not tuned to providing generic application access to directory data. If you have a specific database of information you want to make available to users, a Web server might be a good choice. If you have information you want to make available to a wide variety of applications, a directory server is a better choice.
An argument similar to the previous one for Web servers applies to File Transfer Protocol (FTP) servers. One could argue that the days of FTP servers are numbered now that Web servers are so ubiquitous, but they are not threatened by the arrival of directory services. Again, the main differentiating factor is the size of the data and the type of client that needs access to it. Another important point is that FTP is a simple protocol, tuned to do one thing and do it well. If all you want to do is create a means to transfer files from one place to another, the extra directory infrastructure needed to perform replication, searching, updating, and so on is overkill.
On the other hand, if your application involves more than simple retrieval and storage of information, a directory is a more appropriate choice. Unlike directories, FTP provides no search capability, no attribute-based information model, and no incremental update capability.
DNS, the Internet's Domain Name System, translates host names, such as home.netscape.com, into IP addresses. Host names are good for users who want to remember how to connect to their favorite Internet service. IP addresses are required by the Internet networking infrastructure that is responsible for making the user's connection happen. The DNS and a typical directory have certain similarities, such as providing access to a hierarchical distributed database. But there are some important differences that set the two apart.
The DNS is highly optimized for its main purpose; most directories are meant to be more general-purpose. The DNS has a specialized, fixed set of schemas; directories allow schemas to be extended. DNS servers typically do not allow update of their information; directories do. The DNS can be accessed over a connectionless transport; directories are usually accessed through connection-based transport.
Note
The Internet Engineering Task Force (IETF) has developed standards to add update capabilities to the DNS, and systems such as Microsoft Windows 2000/XP and recent versions of the Berkeley Internet Name Domain (BIND) support dynamic DNS updates. Proposed IETF work on connectionless LDAP is aimed at providing LDAP access over User Datagram Protocol (UDP) in addition to TCP. These efforts and others may eventually bring DNS and LDAP closer together. But for now, the best argument for not trying to replace the DNS with LDAP is that the DNS is working just fine, and it would be disruptive and expensive to rewrite all Internet applications to use anything other than DNS for mapping from host name to IP address.
Directories share certain similarities with many of the services just listed. Directories tend to complement most of these services. We will now take a moment to explore this notion of a complementary directory, what it means, and how it can create synergy among all the services in your enterprise.
A good example of the complementary directory is how it relates to a Web server. Here the directory has some important supporting roles to play.
First, the directory can serve as an authentication database. When clients authenticate to the Web server, their credentials (for example, a user ID and password or a certificate) are checked against the directory. When the Web server needs to make an access control decision, the directory can again be consulted to determine group membership and other information pertinent to the decision. The value of the directory in this role is especially apparent when you consider an environment that is running many Web servers, all of which need access to the same authentication database. By sharing a directory, the Web servers reduce the user management problem to a manageable level. Other services can benefit from this type of use as well.
Second, the directory can serve as a network-accessible storage device for information about configuration, access control, user preferences and profiles, and other things. The value of the directory in this role is twofold:
If any of this information is to be shared across servers, the directory can act as a central repository, eliminating redundant administration much as it does for user and group information. As mentioned earlier in this chapter, it is much more convenient to change a shared configuration item on 100 Web servers via a single update to the directory. The alternatives are to visit each Web server and make the change redundantly or to maintain a separate, ad hoc system to manage configuration.
The directory can provide a standard, network-accessible way to administer all this information. This opens up a whole new set of possibilities for standardized management tools and common administration frameworks.
As a third example of its complementary nature, the directory can be used to help organize and access information contained in the Web server itself. Today, Web search engines exist to catalog and organize Web-based content. But, as anyone who has spent much time using these services can attest, you often spend more time wading through irrelevant matches to your query than reading the actual information desired. The problem is simply that Web content lacks structure and therefore is difficult to organize and search in an automated way. With the advent of XML (eXtensible Markup Language), this lack of structure is beginning to change. This is where directories come in.
Directories are great at organizing and providing subsequent access to information. Imagine today's Web search engines driven by directories: If you used the directory query language and typed information structure to specify the information you were looking for more accurately, the directory could return a much more focused set of results. Keep in mind that in this scenario the content itself is still stored in a Web server and that free-text searches are still conducted as they are today. The directory's value is to provide some structure on top of this arrangement and also to provide a precise, yet flexible, query mechanism.
Another good example of a directory complementing existing services is found in the examination of file systems and FTP servers. In this case a directory can be used not to hold the contents of files, but rather to hold metainformation about those files, their locations, who owns them, and other things that might be useful in locating them. Most importantly, the directory can hold the information that a file system or FTP client needs to access files contained in that service. An FTP server could also use a directory server for authentication purposes.
This idea of using directories to organize and search for information that you want to access is a common theme. Directories often don't hold the content you seek, but they can hold the location of that information along with other attributes that can help you find what you're looking for.
The dividing line between what should be held in a directory and what should be held elsewhere is not always clear. General guidelines are that the larger a piece of information is, the less likely it should be put in a directory. The more frequently the information changes, the less likely it should be maintained in a directory. The less structured a piece of information is, the less benefit you will likely derive from placing it in a directory. However, the more often a piece of information is shared, the more benefit you will likely derive from placing it in a directory.
Here's a brief summary of things to think about when deciding whether to use a directory or another piece of technology for storing information:
Size of the information . Directories are best at storing relatively small pieces of information, not multimegabyte files. Directories are good at storing pointers to large things, but not the large things themselves.
Character of the information . Directories typically have an attribute-based information model where information is broken up into a set of namevalue pairs (see Chapter 2, Introduction to LDAP, for a detailed discussion of the LDAP information model). If you can express your information naturally in this form, a directory might be a good choice. If you can't, consider using a database, file system, or other approach.
Read-to-write ratio . Directories are best for information that is read far more often than it is written. If the information is to be written more frequently, a database or file system might be a more appropriate choice.
Search capability . Directories are made to search the information they contain. If your application has this requirement, a directory might be a good choice.
Standards-based access . If you need standards-based access to your information, a directory is a good choice.
Keeping these principles in mind when you're deciding what storage mechanism to use for your application will keep you out of trouble. By this time, you should have a good understanding of what a directory is, what a directory is not, and how a directory relates to other services in your network. Hopefully this knowledge will help you avoid the "hammer syndrome," in which every problem looks like a nail that you can solve using your directory hammer.