SitemapTable of ContentsBackCoverBeowulf Cluster Computing with Linux, Second EditionSeries ForewordForewordPreface to the Second EditionAcknowledgments for the Second EditionPreface to the First EditionChapter 1: So You Want to Use a Cluster1.1 What Is a Cluster?1.2 Why Use a Cluster?1.3 Understanding Application Requirements1.4 Building and Using a Cluster1.5 Another Way to Read This BookPart I: Enabling TechnologiesChapter 2: Node Hardware2.1 Node Hardware Overview2.2 Microprocessor2.3 Memory2.4 I/O Channels2.5 Motherboard2.6 Persistent Storage2.7 Video2.8 Peripherals2.9 Packaging2.10 Node Choice and Cluster ConstructionChapter 3: Linux3.2 The Linux Kernel3.3 Pruning Your Beowulf Node3.4 Scalable Services3.5 Other Considerations3.6 Final Tuning with /proc3.7 ConclusionsChapter 4: System Area Networks4.1 Network Hardware4.2 Example Networks4.3 Network Software4.4 Performance4.5 Network ChoiceChapter 5: Configuring and Tuning Cluster Networks5.1 Cluster Network Designs5.2 Internet Protocol Stack5.3 Networking Concepts and Services5.4 Simple Cluster Configuration Walkthrough5.5 Improving Performance5.6 Protecting Your Cluster5.7 TroubleshootingChapter 6: Setting up Clusters6.1 Challenges6.2 Hardware Provisioning Challenges and Best Practices6.3 Different Types of Installation Management6.4 The Basic Steps6.5 NPACI Rocks6.6 The OSCAR Toolkit6.7 Other Important Toolkits6.8 When Things go Wrong6.9 SummaryPart II: Parallel ProgrammingChapter 7: An Introduction to Writing Parallel Programs for Clusters7.1 Creating Task Parallelism7.2 Operating System Support for Parallelism7.3 Parameter Studies7.4 Sequence Matching in Computational Biology7.5 Decomposing Programs Into Communicating ProcessesChapter 8: Parallel Programming with MPI8.1 Hello World in MPI8.2 Manager/Worker Example8.3 Two-Dimensional Jacobi Example with One-Dimensional Decomposition8.4 Collective Operations8.5 Parallel Monte Carlo Computation8.6 MPI Programming without MPI8.7 Installing MPICH2 under Linux8.8 Tools for MPI Programs8.9 MPI Implementations for ClustersChapter 9: Advanced Topics in MPI Programming9.2 Fault Tolerance9.3 Revisiting Mesh Exchanges9.4 Motivation for Communicators9.5 More on Collective Operations9.6 Parallel I/O9.7 Remote Memory Access9.8 Using C++ and Fortran 909.9 MPI, OpenMP, and Threads9.10 Measuring MPI Performance9.11 MPI-2 StatusChapter 10: Parallel Virtual Machine10.1 The PVM System10.2 Writing PVM Applications10.3 Installing PVMChapter 11: Fault-Tolerant and Adaptive Programs with PVM11.1 Considerations for Fault Tolerance11.2 Building Fault-Tolerant Parallel Applications11.3 Adaptive ProgramsChapter 12: Numerical and Scientific Software for Clusters12.1 Dense Linear System Solving12.2 Sparse Linear System Solving12.3 Eigenvalue Problems12.4 FFTW12.5 Load Balancing12.6 Support Libraries12.7 Scientific Applications12.8 Freely Available Software for Linear Algebra on the WebPart III: Managing ClustersChapter 13: Cluster Management13.1 Logging13.2 Monitoring, or Measuring Cluster Health13.3 Hardware Failure and Recovery13.4 Software Failure13.5 File System Failure and Recovery13.6 Account Management13.7 Workload Management13.8 Software upgrades13.9 Configuration Management13.10 ConclusionChapter 14: Cluster Workload Management14.1 Goal of Workload Management Software14.2 Workload Management Activities14.3 ConclusionsChapter 15: Condor: A Distributed Job Scheduler15.1 Introduction to Condor15.2 Using Condor15.3 Condor Architecture15.4 Configuring Condor15.5 Administration Tools15.6 Cluster Setup Scenarios15.7 ConclusionChapter 16: Maui Scheduler: A High Performance Cluster Scheduler16.2 Installation and Initial Configuration16.3 Advanced Configuration16.4 Steering Workload and Improving Quality of Information16.5 Troubleshooting16.6 ConclusionsChapter 17: PBS: Portable Batch System17.1 History of PBS17.2 Using PBS17.3 Installing PBS17.4 Configuring PBS17.5 Managing PBS17.6 TroubleshootingChapter 18: Scyld Beowulf18.2 Using Scyld Beowulf18.3 Administration18.4 Features in Upcoming Releases18.5 ConclusionChapter 19: Parallel I/O and the Parallel Virtual File System19.1 Parallel I/O Systems19.2 Parallel File System Architectures19.3 File System Access Semantics19.4 Using PVFS19.5 Parallel I/O in the Future19.6 ConclusionsChapter 20: A Tale of Two Clusters: Chiba City and Jazz20.1 Chiba City20.2 Jazz - A New Production ClusterChapter 21: Conclusions21.2 Future Directions for Clusters21.3 Learning MoreAppendix A: Glossary of TermsL-PQ-WAppendix B: Annotated Reading ListAppendix C: Annotated URLsC.2 Node and Network HardwareC.3 Network SecurityC.4 Performance ToolsC.5 Parallel Programming and SoftwareC.6 Scheduling and ManagementReferencesIndexIndex_AIndex_BIndex_CIndex_DIndex_EIndex_FIndex_GIndex_HIndex_IIndex_JIndex_KIndex_LIndex_MIndex_NIndex_OIndex_PIndex_QIndex_RIndex_SIndex_TIndex_UIndex_VIndex_WIndex_XIndex_YIndex_ZList of FiguresList of Tables