Appendix B and the URLs in Appendix C.
If you are using a cluster that someone else is operating, you need only learn how to program and run applications.Part II covers programming clusters. Even if you do not intend to develop your own parallel applications, we recommend reading Chapter 7, which provides an overview of the technologies. For a deeper understanding of the parallel programming technologies, read the chapters on MPI (Chapters 8 and 9) and PVM (Chapters 10 and 11). Even if you plan to write your own parallel software, you should read Chapter 12 on parallel software and libraries. You may find that what you need has already been written!Once you have your application, you will need to run your program. Part III covers tools for managing and using a cluster. Many clusters will use some kind of workload management system to mediate use of the cluster among the user community. Chapter 14 provides an overview of the concepts and capabilities of these systems. You should also read the chapter that corresponds to the workload system that is used on your cluster: Condor (Chapter 15), Maui (Chapter 16), PBS (Chapter 17), or Scyld (Chapter 18). If your application requires a high-performance, parallel I/O system, read Chapter 19 on the Parallel Virtual File System. These chapters cover information of interest to both the system administrator and the cluster user, so skip over material that doesn't apply to you.
First, re-read this chapter and pay close attention to the discussion of application requirements. These requirements will guide you in your choice of cluster components. Chapters 2 and 4 describe the choices of processor, network, and other hardware. Even if you plan to buy a preassembled cluster, these chapters will help you understand the various choices of components and aid you in understanding the specifications of a cluster. Chapter 2 also covers some of the issues of assembling your own cluster.
Operating a cluster requires an understanding of the operating system. Chapter 3 provides a brief introduction along with a discussion of cluster-specific issues. Chapter 6 describes tools for setting up a cluster. An introduction to managing a cluster from the point of view of the system administrator is presented in Chapter 13. Chapter 14 provides an overview of the concepts and capabilities of Chapter 15), Maui (Chapter 16), PBS (Chapter 17), or Scyld (Chapter 18). Once the cluster is up and running, you may need to tune the network and operating system. Chapter 3 provides some information on tuning the OS; Chapter 5 discusses techniques for tuning the network and communication systems. Finally, Chapter 20 provides a case study of two generations of a major cluster system, illustrating particular choices and best practices.