Introduction to Parallel Programming de Subodh Kumar

Introduction to Parallel Programming

Subodh Kumar

"Traditionally computers have been machines that operate processes in parallel. Even the earliest computers were made from parts that worked together, collaboratively and simultaneously. Processors today consist of more than a billion transistors, many of which are active simultaneously while executing a program. In modern computer science, there exists no truly sequential computing system. While sequential programming can still be used due to the parallelism built into the compilers and OS, for most advanced programming, parallel programming is rapidly becoming essential. This is most evident in modern application domains like scientific computation, data science, machine intelligence, etc. As a result, parallel programming is increasingly being offered as an elective course in most undergraduate computer science and engineering programs. This book intends to introduce a beginner to the gamut of parallel programming and will be useful for undergraduate students of computer science and engineering"--

Citește tot Restrânge

Cuprins

List of Figures; Introduction; Concurrency and Parallelism; Why Study Parallel Programming; What is in this Book; 1. An Introduction to Parallel Computer Architecture; 1.1 Parallel Organization; SISD: Single Instruction, Single Data; SIMD: Single Instruction, Multiple Data; MIMD: Multiple Instruction, Multiple Data; MISD: Multiple Instruction, Single Data; 1.2 System Architecture; 1.3 CPU Architecture; 1.4 Memory and Cache; 1.5 GPU Architecture; 1.6 Interconnect Architecture; Routing; Links; Types and Quality of Networks; Torus Network; Hypercube Network; Cross-Bar Network; Shuffle-Exchange Network; Clos Network; Tree Network; Network Comparison; 1.7 Summary; 2. Parallel Programming Models; 2.1 Distributed-Memory Programming Model; 2.2 Shared-Memory Programming Model; 2.3 Task Graph Model; 2.4 Variants of Task Parallelism; 2.5 Summary; 3. Parallel Performance Analysis; 3.1 Simple Parallel Model; 3.2 Bulk-Synchronous Parallel Model; BSP Computation Time; BSP Example; 3.3 PRAM Model; PRAM Computation Time; PRAM Example; 3.4 Parallel Performance Evaluation; Latency and Throughput; Speed-up; Cost; Efficiency; Scalability; Iso-efficiency; 3.5 Parallel Work; Brent's Work-Time Scheduling Principle; 3.6 Amdahl's Law; 3.7 Gustafson's Law; 3.8 Karp–Flatt Metric; 3.9 Summary; 4. Synchronization and Communication Primitives; 4.1 Threads and Processes; 4.2 Race Condition and Consistency of State; Sequential Consistency; Causal Consistency; FIFO and Processor Consistency; Weak Consistency; Linearizability; 4.3 Synchronization; Synchronization Condition; Protocol Control; Progress; Synchronization Hazards; 4.4 Mutual Exclusion; Lock; Peterson's Algorithm; Bakery Algorithm; Compare and Swap; Transactional Memory; Barrier and Consensus; 4.5 Communication; Point-to-Point Communication; RPC; Collective Communication; 4.6 Summary; 5. Parallel Program Design; 5.1 Design Steps; Granularity; Communication; Synchronization; Load Balance; 5.2 Task Decomposition; Domain Decomposition; Functional Decomposition; Task Graph Metrics; 5.3 Task Execution; Preliminary Task Mapping; Task Scheduling Framework; Centralized Push Scheduling Strategy; Distributed Push Scheduling; Pull Scheduling; 5.4 Input/Output; 5.5 Debugging and Profiling; 5.6 Summary; 6. Middleware: The Practice of Parallel Programming; 6.1 OpenMP; Preliminaries; OpenMP Thread Creation; OpenMP Memory Model; OpenMP Reduction; OpenMP Synchronization; Sharing a Loop's Work; Other Work-Sharing Pragmas; SIMD Pragma; Tasks; 6.2 MPI; MPI Send and Receive; Message-Passing Synchronization; MPI Data Types; MPI Collective Communication; MPI Barrier; MPI Reduction; One-Sided Communication; MPI File IO; MPI Groups and Communicators; MPI Dynamic Parallelism; MPI Process Topology; 6.3 Chapel; Partitioned Global Address Space; Chapel Tasks; Chapel Variable Scope; 6.4 Map-Reduce; Parallel Implementation; Hadoop; 6.5 GPU Programming; OpenMP GPU Off-Load; Data and Function on Device; Thread Blocks in OpenMP; CUDA; CUDA Programming Model; CPU–GPU Memory Transfer; Concurrent Kernels; CUDA Synchronization; CUDA Shared Memory; CUDA Parallel Memory Access; False Sharing; 6.6 Summary; 7. Parallel Algorithms and Techniques; 7.1 Divide and Conquer: Prefix-Sum; Parallel Prefix-Sum: Method 1; Parallel Prefix-Sum: Method 2; Parallel Prefix-Sum: Method 3; 7.2 Divide and Conquer: Merge Two Sorted Lists; Parallel Merge: Method 1; Parallel Merge: Method 2; Parallel Merge: Method 3; Parallel Merge: Method 4; 7.3 Accelerated Cascading: Find Minima; 7.4 Recursive Doubling: List Ranking; 7.5 Recursive Doubling: Euler Tour; 7.6 Recursive Doubling: Connected Components; 7.7 Pipelining: Merge-Sort; Basic Merge-Sort; Pipelined Merges; 4-Cover Property Analysis; Merge Operation per Tick; 7.8 Application of Prefix-Sum: Radix-Sort; 7.9 Exploiting Parallelism: Quick-Sort; 7.10 Fixing Processor Count: Sample-Sort; 7.11 Exploiting Pa