layout: true .footer.venue[Many Core Workshop 2018 - York, UK - June 15
th
2018] --- class: center, middle
# Solving Combinatorial Problems on Adaptive Many-Core Architectures Ghaith Tarawneh.red[*] and Andrey Mokhov
School of Engineering, Newcastle University, UK
--- layout: true
.footer.venue[Many Core Workshop 2018 - York, UK - June 15
th
2018] --- ## Outline .sep10px[ ] .outline-item[1] .big[The POETS Project] .outline-item[2] .big[Programming Model] .outline-item[3] .big[Use Case: SAT Solver] .outline-item[4] .big[Conclusion] --- class: center, middle # The POETS Project --- ## POETS Project Team
--- ## What is POETS? .sep10px[ ] .emph[Partially Ordered Event Triggered Systems (POETS):] Massively-parallel architecture consisting of an extremely large number (>10.sup[6]) of very simple cores that communicate by exchanging short messages .block[(few bytes)]
.sep10px[ ] * Cores are embedded in a fast parallel communications fabric (the .emph[core mesh]). .sep10px[ ] * Emphasis is on achieving high scalability by taking computation to the data, instead of the other way around. --- ## Scientific Computing .subslide[(1/2)] POETS is not a general purpose architecture but it provides .emph[significant speedup] for certain classes of problems, particularly in scientific computing.
.sep10px[ ] .emph[Examples:] * Finite element analysis * Neural modeling * Particle simulations * Image processing * Drug screening * Weather modeling --- ## Scientific Computing .subslide[(2/2)] Problems must be decomposed and mapped to the core mesh
.sep10px[ ] .emph[Static mapping (scientific computing)]: * Problem decomposition is known apriori * Pre-compute mapping between problem domain and core mesh --- ## Combinatorial Problems .subslide[(1/2)] POETS is also well suited for discrete problems.
.sep10px[ ] .emph[Examples:] * State space exploration * Linear programming * Boolean satisfiability * Constraint-based optimization --- ## Combinatorial Problems .subslide[(2/2)] .sep10px[ ] .emph[Dynamic mapping (combinatorial problems)]: * The problem domain is discovered during execution * Decomposed sub-problems are mapped to the core mesh at runtime .sep10px[ ] .center[
] --- class: center, middle # Programming Model --- ## Motivating Example .subslide[(1/2)] Partial code snippet from an MPI program to approximate π .squeeze-code[ ```C #include "mpi.h" #include
#include
int main (int argc, char *argv[]) { MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numtasks); MPI_Comm_rank(MPI_COMM_WORLD,&taskid); printf ("MPI task %d has started...\n", taskid); avepi = 0; for (i = 0; i < ROUNDS; i++) { homepi = dboard(DARTS); if (taskid == MASTER) { pi = pisum/numtasks; avepi = ((avepi * i) + pi)/(i + 1); printf("After %8d throws, average value of pi = %10.8f\n", (DARTS * (i + 1)),avepi); } } MPI_Finalize(); return 0; } ``` ] --- ## Motivating Example .subslide[(2/2)] - Message passing is a very popular programming/execution model for massively-parallel machines - Message passing programs can be .emph[reliable and performant] but they tend to mix .emph[problem logic] with .emph[non-problem-related concerns] .block[(e.g. node discovery], load balancing and synchronization) - This is .emph[non-ideal from a software engineering perspective] because it makes programs and algorithms: * difficult to read, refactor and maintain * problem-specific (difficult to re-use) * architecture-specific (difficult to port) - Can we do better? --- ## Proposed Programming Model We propose a multi-layer software stack to develop combinatorial solvers for massively-parallel POETS-like architectures.
--- ## Layers 1 & 2 .emph[Message Passing and Scheduling] * Expose `init` and `receive` functions to initialize core states then update them on message arrival * Example algorithm (mesh traversal): ```Python def init(node): state = {visited: False} return state def receive(node, state, sender, msg, send, neighbours): if state[visited] == False then state[visited] = True for n in neighbours do send(n, EMPTY_MSG) ``` --- ## Layer 3 .emph[Problem Mapping] Handles sub-problem distribution and load balancing. Example (summing 0 to n): ```Python def receive(state, ticket, msg, send): if msg == Call(n) then if n<1 then send(Result(0), ticket) else ticket = send(Call(n-1)) state = Continue(ticket, n) else if msg == Result(total) then if state == Continue(ticket, n) then send(Result(total + n), ticket) else state = Done(total) else if msg == Trigger then send(Call(10)) ``` --- ## Layer 4 .emph[Recursion] Implement a generalized form of the "ticket pattern" in Layer 3 so that users won't have to do call bookkeeping themselves. This layers abstracts away message passing and exposes a different programming model to the user (e.g. imperative, functional). --- ## Layer 5 .emph[Application Logic] An alternative implementation to sum the numbers 0 to n: ```Python def calculate_sum(n): if n<1 then yield Result(0) else yield Call(n-1) total = yield Sync() yield Result(total + n) ``` --- class: center, middle # Use Case: SAT Solver --- ## POETS Simulator - We created a model POETS machine in Python and used it to develop and evaluate a concrete implementation of the model. --- ## Boolean Satisfiability (SAT) SAT is a classic NP-complete problem: - Given a list of Boolean expression in Conjunctive Normal Form (CNF), determine if an assignment exists such that all statements evaluate to true. - There are many clever heuristics to prune the search space but all solutions are essentially a form of state space exploration. .sep10px[ ] We wanted to find out how much coding effort is needed to: 1. Port an existing SAT solver algorithm (DPLL) to our architecture 2. Experiment with architectural features and their impact on performance 3. Produce code that we can re-use to develop other combinatorial solvers --- ## Ported Solver Algorithm (DPLL) ```Python def solve_sat(problem): if consistent(problem) then yield Result(SAT) if exist_empty_clause(problem) then yield Result(UNSAT) for clause in problem[clauses] do if unit_clause(clause) then unit_propagate(problem, clause) for literal in problem[literals] do if literal_pure(literal) then assign_pure(problem, literal) L = select_literal(problem) subp1 = assign(problem, L, True) subp2 = assign(problem, L, False) yield [is_SAT, Call(subp1), Call(subp2)] result = yield Sync() yield result ``` --- ## Simulation Results - We ran experiments to test several mesh topologies and mapping algorithms. - Our main insight is that we did not have to modify the application code for running on different mesh dimensions, topologies or mapping algorithms.
--- ## Conclusion 1. We presented a programming model to develop combinatorial solvers for .block[a novel] supercomputer architecture (POETS) 2. Using the model gave us the best of two worlds: - The convenience and portability of a familiar programming style at the application layer (layer 5) - An underlying hierarchy of solutions to common problems (layers 1-4), which we can maintain, optimize and re-use for developing other solvers 3. Abstraction is an age-old idea but supercomputer programming models and abstractions tend to gravitate towards being either too thin or too deep. 4. We need more hierarchical models with building blocks (at different levels of organization) that we can maintain, port and re-use.