## Introduction

Numerical relaxation consists of a class of techniques for solving systems of equations over some discretization of the state space. These techniques are especially useful in engineering and physics domains, where solving differential equations subject to boundary conditions, such as the heat equation or Navier-Stokes equations, is a reoccurring problem.

\[\frac{\partial u}{\partial t} - h^2 \nabla^2 u = 0\]

These types of problems typically benefit immensely from parallelization on the GPU, since they consist of repeatedly applying an operator over the whole state space until convergence. However, programming in CUDA or OpenCL can be a labor intensive process that requires knowledge about memory allocation on the GPU along with some C++ expertise . The python library Theano makes this process much simpler if you are willing to sacrifice maximizing speed for greatly reducing development time.

## Simple Case Study

Laplace's equation is a slightly simpler version of the heat equation. The Laplacian operator \( \nabla^2 \) appears as a component in many of the fundamental physical laws and can be thought of as representing either curvature or diffusion (of something) over space.

\[ \nabla^2 u = 0\]

This equation can be solved numerically using the update rule

Theano is a python library that constructs a symbolic graph of your function and then complies into C. Theano can compile all its functionality into code runable either on the CPU or the GPU simply by changing a configuration option. Because it is symbolic, your code can be written in a way that is very close to the mathematical formulation. However, Theano is focused around large, multidimensional matrices so we must reformulate the above equation into a matrix operation.

We can turn the equation above into a matrix multiplcation operation by realizing that the update step is equivalent to a convolution operation, for some window function (or filter), over the state space. For the two update case shown above, the convolution operator would look like the following

\[ W = \begin{bmatrix} 0 & 0.25 & 0\\ 0.25 & 0 & 0.25 \\ 0 & 0.25 & 0 \end{bmatrix} \]

Theano has a convolution operation (theano.tensor.signal.conv.conv2d) which can readily handle W and some two dimensional grid.

The following python code computes this convolution and updates the grid, illustrating how simple it is to work with Theano. In just lines of code we have a function that can be compiled and parallelized onto the GPU.

# Perform convolution R = conv.conv2d(grid, window) # # Create the output by updating the values on the original grid out = T.set_subtensor(g[0, 0, 1:-1, 1:-1], R[0,0,:,:]) # Compute max change to test for convergence diff = T.flatten(abs(g-out)) max_change = T.max(diff) # Compile from symbolic to C relax = theano.function(inputs=[], outputs=[max_change], updates=[(grid, out)] )

## Performance Comparison

So what kind of performance boost do we get using the GPU? My system has an Intel Xeon E5-1620 v2 Quad-Core Processor 3.7GHz with an NVIDIA GeForce 780TI GPU. Running over a state space of size 6000 x 6000 for 100 iterations I found the following timings:

GPU Total time : 15.16

CPU Total time : 49.08

Which amounts to roughly a 3x improvement. I would have liked to have seen a 10x-20x improvement, but considering the ease of use Theano provides the results are impressive. You can find the full code I used here.