Search This Blog

Program Optimization for Multicore Architectures

Program Optimization for Multicore Architectures

Instructors

Sanjeev K Aggarwal ska AT cse DOT iitk DOT ac DOT in
Mainak Chaudhuri mainakc AT cse DOT iitk DOT ac DOT in
Rajat Moona moona AT cse DOT iitk DOT ac DOT in


Course Contents




The course will cover the following:
  • What are multi-core architectures
  • Issues involved in writing code for multi-core architectures
  • How to develop programs for these architectures
  • What are the program optimizations techniques
  • How to build some of these techniques in compilers
  • OpenMP and other message passing libraries, threads, mutex etc.
Details:
Introduction to parallel computers: Instruction Level Parallelism (ILP) vs. Thread Level Parallelism (TLP); performance issues: brief introduction to cache hierarchy and communication latency; shared memory multiprocessors: general architecture and the problem of cache coherence; synchronization primitives: atomic primitives; locks: TTS, tickets, array; barriers: central and tree; performance implications in shared memory programs; chip multiprocessors: why CMP (Moore's law, wire delay) ; shared L2 vs. tiled CMP; core complexity; power/performance; snoopy coherence: invalidate vs. update, MSI, MESI, MOESI, MOSI; performance trade-offs; pipelined snoopy bus design; memory consistency models: SC, PC, TSO, PSO, WO/WC, RS; chip multiprocesor case studies: Intel Montecito and dual core Pentium 4, IBM power4, Sun Niagara
Introduction to optimization: overview of parallelism, shared memory programming; introduction to OpenMP; data flow analysis, pointer analysis, alias analysis, data dependence analysis, solving data dependence equations (integer linear programming problem); loop optimizations; memory hierarchy issues in code optimization
Operating system issues for multiprocessing: need for pre-emptive OS, scheduling techniques: usual OS scheduling techniques, threads, distributed scheduler, multiprocessor scheduling , gang scheduling; communication between processes, message boxes, shared memory; sharing issues and synchronization, sharing memory and other structures, sharing I/O devices, distributed semaphores, monitors spin locks, implementation techniques for multicores; case studies from applications: digital signal processing, image processing, speech processing

Text references




  • J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kofmann publishers, 3rd Edition.
  • D. E. Culler, J. P. Singh, with A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kofmann publishers, 2nd Edition.
  • Steven S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kauffman
  • Wolfe, Optimizing Supercompilers for Supercomputers, Addison-Wesley
  • Optimizing Compilers for Modern Architectures, Allen and Kennedy, Morgan Kauffman
  • Tanenbaum A S, Distributed Operating Systems, Prentice Hall.
  • Coulouris, Dollimore and Kindberg Distributed Systems Concept and Design Addison-Wesley.
  • Silberschatz, Galvin, Operating Systems Principles, Addison-Wesley

Lectures

  • Introduction to the course and logistics (Slides)
  • Introduction to multi-core architectures (Slides)
  • OpenMP Tutorials (Slides)
  • Intel Tools (Slides)
  • Virtual memory and caches, Parallel programming, Coherence and consistency, Synchronization, Case studies of CMP (Slides)
  • Shared Memory Multiprocessors (Slides)
  • Introduction to Optimization, Control flow Analysis, Dataflow Analysis, Compilers for High Performance Architectures, Data Dependence Analysis (Slides)
  • Loop Optimizations (Slides)
  • CPU Scheduling, Synchronization, Multi-processor Scheduling, Security issues (Slides)