Notes/UNB/Year 5/Semester 2/CS4745/Lecture Notes.md

4.7 KiB

Models of Parallel Computing

  • SISD (x86)
  • SIMD (AVX)
  • MISD
  • MIMD (GPU)

History of parallel computing

Prior to the 1990s, computers had modules for parallel computation, usually specialized. Clusters of computers (Beowulf Revolution) connected by a network, which were able to be dispatched commands and do computation together. Grid computing was the idea of connecting "all" computers to a grid, similar to the power grid, and to distribute computing resources among connected peers. Cloud computing is a model of distributing computation resources based on payment to a provider which provides managed computing. This led to the Hadoop file system (2004), which led to computation directly on files, with no need to load all contents of the file into memory. This led to Map Reduce, which was the main framework the file system worked under. GPU computing naturally lends itself to processing linear algebra operations (transformations of pixels and triangles), suitable for massive parallelization of these tasks, even outside of graphics. FPGA devices can be used to parallelized specific/specialized tasks given a logic design that is lends itself to parallelization. Quantum computers are highly parallel accelerators, still in development. The algorithms were known starting in the 1980s, but the hardware is still behind. Multicore systems are designed for improved latency, and they can fulfill general purpose computation, example: CPU with multiple cores. Manycore systems have much higher core count, and they are focused on the throughput (amount of computation performed per unit of time), example: GPU.

Parallel Programming

Implicit

  • MapReduce
  • fork-join, Executor services with thread pools
  • Allows shifting attention from implementation to task description Semi-implicit
  • Parallel for
  • OpenMP
  • Allows you to use the pre-defined directives for achieving parallel execution without focusing on how it works Explicit
  • Scatter, Gather
  • pthreads
  • The developers have the most control on computation, but have to manage and assure there are no problems in results Compilers are good at designing optimal sequential code, but compiler optimizations may prevent the algorithm from working as expected, as well as instruction reordering impacting results.

Patterns

Like OOP, patterns for parallel processing exist. They can be classified as "structural, computational or ..." TODO: look at notes

Structural Pattern: Pipe and Filter

Stream of Messages -> Language Identification -> English Messages -> Metadata Removal -> Plain Text -> Tokenization -> Words -> ...

The time and process requirements of each task may differ, which introduces the need to load balance, as well as the concept of the bottleneck, which is the slowest task in the chain. Embarrassingly parallel: Problem which can be parallelized by simply allocating more hardware resources.

Word Count with Map Reduce

Each thread can map a document of a word to a count, which after completion can be reduced to one map from each thread, usually with a parallel hash table.

Memory Schemes

Multicore systems, ones which have the ability to compute multiple tasks at the same time in a shared memory environment are often latency oriented, with the start and stop time of results being minimized, as well as the time between results is minimized.

Manycore systems, ones which are often a large number of cores tasked with the same problem, in a shared memory environment are often throughput oriented, with the amount computation being higher amortized over a period of time, or focusing on longer running tasks/computation.

Distributed Memory

The tasks are not within the same memory space, and do not share memory addresses, and there is no need for communication between nodes

SIMD

TODO: Go and insert tables/data from slides

Threads and Processes

Threads are fundamental units of execution, with their own program counter, and implemented in most operating system, are schedulable entities.

Processes are instances of a running program. Each process has its own memory space and cannot share memory between processes.

Starting threads are a bit simpler, as assigning a process needs to be registered with the OS as its own PID, as well as creation of virtual memory space by the OS.

Multiprocessor Architecture Design

Software Considerations

Programming languages use building blocks and their order of execution to make a program, based on functions with input and outputs.

JIT compilers can by using runtime data optimize for certain sub tasks. Juilia is JIT compiled to LLVM IR and then translated to machine code by LLVM.

Task Graph Model

The main motivation for using this is for data dependancies, as we cannot use data that is not loaded.