Search This Blog

Program Optimization for Multicore Architectures

Program Optimization for Multicore Architectures

Course Contents

The course will cover the following:
  • What are multi-core architectures
  • Issues involved in writing code for multi-core architectures
  • How to develop programs for these architectures
  • What are the program optimizations techniques
  • How to build some of these techniques in compilers
  • OpenMP and other message passing libraries, threads, mutex etc.
Details:
Introduction to parallel computers: Instruction Level Parallelism (ILP) vs. Thread Level Parallelism (TLP); performance issues: brief introduction to cache hierarchy and communication latency; shared memory multiprocessors: general architecture and the problem of cache coherence; synchronization primitives: atomic primitives; locks: TTS, tickets, array; barriers: central and tree; performance implications in shared memory programs; chip multiprocessors: why CMP (Moore's law, wire delay) ; shared L2 vs. tiled CMP; core complexity; power/performance; snoopy coherence: invalidate vs. update, MSI, MESI, MOESI, MOSI; performance trade-offs; pipelined snoopy bus design; memory consistency models: SC, PC, TSO, PSO, WO/WC, RS; chip multiprocesor case studies: Intel Montecito and dual core Pentium 4, IBM power4, Sun Niagara
Introduction to optimization: overview of parallelism, shared memory programming; introduction to OpenMP; data flow analysis, pointer analysis, alias analysis, data dependence analysis, solving data dependence equations (integer linear programming problem); loop optimizations; memory hierarchy issues in code optimization
Operating system issues for multiprocessing: need for pre-emptive OS, scheduling techniques: usual OS scheduling techniques, threads, distributed scheduler, multiprocessor scheduling , gang scheduling; communication between processes, message boxes, shared memory; sharing issues and synchronization, sharing memory and other structures, sharing I/O devices, distributed semaphores, monitors spin locks, implementation techniques for multicores; case studies from applications: digital signal processing, image processing, speech processing

Lectures

  • Introduction to the course and logistics (Slides)
  • Introduction to multi-core architectures (Slides)
  • OpenMP Tutorials (Slides)
  • Intel Tools (Slides)
  • Virtual memory and caches, Parallel programming, Coherence and consistency, Synchronization, Case studies of CMP (Slides)
  • Shared Memory Multiprocessors (Slides)
  • Introduction to Optimization, Control flow Analysis, Dataflow Analysis, Compilers for High Performance Architectures, Data Dependence Analysis (Slides)
  • Loop Optimizations (Slides)
  • CPU Scheduling, Synchronization, Multi-processor Scheduling, Security issues (Slides)

No comments:

Post a Comment