Beowulf Cluster Computing with Linux, Second Edition [Electronic resources]

William Gropp; Ewing Lusk; Thomas Sterling

نسخه متنی -صفحه : 198/ 3

Sitemap

Table of Contents

Beowulf Cluster Computing with Linux, Second Edition

Series Foreword

Preface to the Second Edition

Acknowledgments for the Second Edition

Preface to the First Edition

Chapter 1: So You Want to Use a Cluster

1.1 What Is a Cluster?

1.2 Why Use a Cluster?

1.3 Understanding Application Requirements

1.4 Building and Using a Cluster

1.5 Another Way to Read This Book

Part I: Enabling Technologies

Chapter 2: Node Hardware

2.1 Node Hardware Overview

2.2 Microprocessor

2.4 I/O Channels

2.5 Motherboard

2.6 Persistent Storage

2.8 Peripherals

2.10 Node Choice and Cluster Construction

Chapter 3: Linux

3.2 The Linux Kernel

3.3 Pruning Your Beowulf Node

3.4 Scalable Services

3.5 Other Considerations

3.6 Final Tuning with /proc

3.7 Conclusions

Chapter 4: System Area Networks

4.1 Network Hardware

4.2 Example Networks

4.3 Network Software

4.4 Performance

4.5 Network Choice

Chapter 5: Configuring and Tuning Cluster Networks

5.1 Cluster Network Designs

5.2 Internet Protocol Stack

5.3 Networking Concepts and Services

5.4 Simple Cluster Configuration Walkthrough

5.5 Improving Performance

5.6 Protecting Your Cluster

5.7 Troubleshooting

Chapter 6: Setting up Clusters

6.2 Hardware Provisioning Challenges and Best Practices

6.3 Different Types of Installation Management

6.4 The Basic Steps

6.5 NPACI Rocks

6.6 The OSCAR Toolkit

6.7 Other Important Toolkits

6.8 When Things go Wrong

Part II: Parallel Programming

Chapter 7: An Introduction to Writing Parallel Programs for Clusters

7.1 Creating Task Parallelism

7.2 Operating System Support for Parallelism

7.3 Parameter Studies

7.4 Sequence Matching in Computational Biology

7.5 Decomposing Programs Into Communicating Processes

Chapter 8: Parallel Programming with MPI

8.1 Hello World in MPI

8.2 Manager/Worker Example

8.3 Two-Dimensional Jacobi Example with One-Dimensional Decomposition

8.4 Collective Operations

8.5 Parallel Monte Carlo Computation

8.6 MPI Programming without MPI

8.7 Installing MPICH2 under Linux

8.8 Tools for MPI Programs

8.9 MPI Implementations for Clusters

Chapter 9: Advanced Topics in MPI Programming

9.2 Fault Tolerance

9.3 Revisiting Mesh Exchanges

9.4 Motivation for Communicators

9.5 More on Collective Operations

9.6 Parallel I/O

9.7 Remote Memory Access

9.8 Using C++ and Fortran 90

9.9 MPI, OpenMP, and Threads

9.10 Measuring MPI Performance

9.11 MPI-2 Status

Chapter 10: Parallel Virtual Machine

10.1 The PVM System

10.2 Writing PVM Applications

10.3 Installing PVM

Chapter 11: Fault-Tolerant and Adaptive Programs with PVM

11.1 Considerations for Fault Tolerance

11.2 Building Fault-Tolerant Parallel Applications

11.3 Adaptive Programs

Chapter 12: Numerical and Scientific Software for Clusters

12.1 Dense Linear System Solving

12.2 Sparse Linear System Solving

12.3 Eigenvalue Problems

12.5 Load Balancing

12.6 Support Libraries

12.7 Scientific Applications

12.8 Freely Available Software for Linear Algebra on the Web

Part III: Managing Clusters

Chapter 13: Cluster Management

13.2 Monitoring, or Measuring Cluster Health

13.3 Hardware Failure and Recovery

13.4 Software Failure

13.5 File System Failure and Recovery

13.6 Account Management

13.7 Workload Management

13.8 Software upgrades

13.9 Configuration Management

13.10 Conclusion

Chapter 14: Cluster Workload Management

14.1 Goal of Workload Management Software

14.2 Workload Management Activities

14.3 Conclusions

Chapter 15: Condor: A Distributed Job Scheduler

15.1 Introduction to Condor

15.2 Using Condor

15.3 Condor Architecture

15.4 Configuring Condor

15.5 Administration Tools

15.6 Cluster Setup Scenarios

15.7 Conclusion

Chapter 16: Maui Scheduler: A High Performance Cluster Scheduler

16.2 Installation and Initial Configuration

16.3 Advanced Configuration

16.4 Steering Workload and Improving Quality of Information

16.5 Troubleshooting

16.6 Conclusions

Chapter 17: PBS: Portable Batch System

17.1 History of PBS

17.3 Installing PBS

17.4 Configuring PBS

17.5 Managing PBS

17.6 Troubleshooting

Chapter 18: Scyld Beowulf

18.2 Using Scyld Beowulf

18.3 Administration

18.4 Features in Upcoming Releases

18.5 Conclusion

Chapter 19: Parallel I/O and the Parallel Virtual File System

19.1 Parallel I/O Systems

19.2 Parallel File System Architectures

19.3 File System Access Semantics

19.4 Using PVFS

19.5 Parallel I/O in the Future

19.6 Conclusions

Chapter 20: A Tale of Two Clusters: Chiba City and Jazz

20.1 Chiba City

20.2 Jazz - A New Production Cluster

Chapter 21: Conclusions

21.2 Future Directions for Clusters

21.3 Learning More

Appendix A: Glossary of Terms

Appendix B: Annotated Reading List

Appendix C: Annotated URLs

C.2 Node and Network Hardware

C.3 Network Security

C.4 Performance Tools

C.5 Parallel Programming and Software

C.6 Scheduling and Management

List of Figures