Computer Architecture: Parallelism and Locality
Instructor Mattan Erez
Description: Two major challenges facing computer architects today are dealing with tight power budgets and achieving high performance as off-chip bandwidth diminishes in comparison with available on-chip compute resources. In this course we will explore how the fundamental properties of locality and parallelism can be utilized in both hardware and software to overcome these challenges of power and bandwidth constraints. We will develop hardware cost models and hardware and software techniques through a combination of structured lectures, paper reading, discussions, homework assignments, programming labs, and a collaborative project. Examples of architectures and methods that will be covered include traditional general-purpose processors, massively parallel processors, parallel memory systems, parallel programming and execution models, shared memory systems, distributed shared memory systems, domain decomposition techniques, and cache-aware and cache-oblivious algorithms (tentative syllabus below).
Text :
There is no required textbook for this class, however, you may find the following useful:
Instructor Mattan Erez
Description: Two major challenges facing computer architects today are dealing with tight power budgets and achieving high performance as off-chip bandwidth diminishes in comparison with available on-chip compute resources. In this course we will explore how the fundamental properties of locality and parallelism can be utilized in both hardware and software to overcome these challenges of power and bandwidth constraints. We will develop hardware cost models and hardware and software techniques through a combination of structured lectures, paper reading, discussions, homework assignments, programming labs, and a collaborative project. Examples of architectures and methods that will be covered include traditional general-purpose processors, massively parallel processors, parallel memory systems, parallel programming and execution models, shared memory systems, distributed shared memory systems, domain decomposition techniques, and cache-aware and cache-oblivious algorithms (tentative syllabus below).
Text :
There is no required textbook for this class, however, you may find the following useful:
- Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill, “Patterns for parallel programming”, 2005, Addison-Wesley Boston.
- David B. Kirk and Wen-Mei Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, 2010, Morgan Kaufmann.
Download Slides :
Lecture | Topic (notes) |
---|---|
1 | Introduction |
2 | Locality in CPUs |
3 | Locality + Cache aware |
4 | Locality + Cache oblivious |
5 | Wires/Interconnect |
6 | Wires II |
7 | Wire Alternatives + HW Parallelism |
8 | HW Parallelism (I) (pptx/pdf) |
9 | HW Parallelism (II)(pptx/pdf) |
10 | SW Parallelism (I) (pptx/pdf) |
11 | SW Parallelism (II) (pptx/pdf) |
12 | SW Parallelism (III) (pptx/pdf) |
13 | SW Parallelism (IV) (pptx/pdf) |
14 | SW Parallelism (V) (pptx/pdf) |
15 | GPU and Graphics Intro |
16 | NVIDIA GPUs (I) (pptx/pdf) |
17 | NVIDIA GPUs (II) (pptx/pdf) |
18 | NVIDIA GPUs (III) (pptx/pdf) |
19 | NVIDIA GPUs (IV) (pptx/pdf) |
20 | NVIDIA GPUs (V) (pptx/pdf) |
21 | NVIDIA GPUs (VI) + CUDA (I) (pptx/pdf) |
22 | CUDA (pptx/pdf) |
23 | Memory Systems (pptx/pdf) |
24 | Quiz |
25 | Memory (II) + Heterogeneous (I) |
26 | Heterogeneous (II) |
27 | Conclusions (and reliability) |