5. Interoperability

Futhark is a purely functional high-performance language incapable of interacting with the outside world except through function parameters. This makes it impossible to write full applications in Futhark, except via the limited standard input-based interface that we used in the preceding chapters. In practice, this interface is too slow and too inflexible to be useful. Instead, the Futhark compiler is designed to generate libraries, which can then be invoked by general-purpose languages. In this chapter we will see how to call Futhark from Python and C, with particular attention paid to the former.

5.1. Calling Futhark from Python

Python is a language with many qualities, but few would claim that performance is among them. While libraries such as NumPy can be used, they are not as flexible as being able to write code directly in a high-performance language. Unfortunately, writing the performance-critical parts of a Python program in (say) C is not always a good experience, and the interfacing between the Python code and the C code can be awkward and inelegant (although to be fair, it is still nicer in Python than in many other languages). It would be more convenient if we could compile a high-performance language directly to a Python module that we could then import like any other piece of Python code. Of course, this entire exercise is only worthwhile if the code in the resulting Python module executes much faster than manually written Python. Fortunately, when most of the computation can be offloaded to the GPU via OpenCL, the Futhark compiler is capable of this feat.

OpenCL works by having an ordinary program running on the CPU that transmits code and data to the GPU (or any other accelerator, but we’ll stick to GPUs). In the ideal case, the CPU-code is mostly glue that performs bookkeeping and making API calls - in other words, not resource-intensive, and exactly what Python is good at. No matter the language the CPU code is written in, the GPU code will be written in OpenCL C and translated at program initialisation to whatever machine code is needed by the concrete GPU.

This is what is exploited by the PyOpenCL backend in the Futhark compiler. Certainly, the CPU-level code is written in pure Python and quite slow, but all it does is use the PyOpenCL library to offload work to the GPU. The fact that this offloading takes place is hidden from the user of the generated code, who is provided a module with functions that accept and produce ordinary NumPy arrays.

Consider our usual dot product program:

def main (x: []i32) (y: []i32): i32 =
  reduce (+) 0 (map2 (*) x y)

We can compile this to a Python module:

$ futhark pyopencl --library dotprod.fut

The result is a file dotprod.py that we can import from within Python:

$ python
>>> import dotprod

The dotprod.py module defines a class dotprod that we must instantiate. The class maintains various bits of bookkeeping information, and exposes a method for every entry point in our program (here just main):

>>> o = dotprod.dotprod()

We will get an error if we try to pass Python lists to the entry point, as lists are not arrays:

>>> o.main([1,2,3], [4,5,6])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "dotprod.py", line 2416, in main
    x_mem_3884_ext))
TypeError: Argument #0 has invalid value
Futhark type: []i32
Argument has Python type <type 'list'> and value: [1, 2, 3]

Instead, we have to construct a properly typed NumPy array:

>>> import numpy as np
>>> o.main(np.array([1,2,3], dtype=np.int32),
           np.array([4,5,6], dtype=np.int32))
32

The integer that is returned is a normal Python object of an appropriate type (in this case it will have type np.int32). If an array is returned, it is in the form of a PyOpenCL array, which is mostly compatible with NumPy arrays, except that the backing memory still resides on the GPU, and is not copied over to the CPU unless necessary. This makes it efficient to take the output of one entry point and pass it as the input to another. PyOpenCL arrays contain a .get() method that can be used to construct an equivalent NumPy array, if desired.

5.2. Calling Futhark from C

Let us once again consider dotprod.fut:

def main (x: []i32) (y: []i32): i32 =
  reduce (+) 0 (map2 (*) x y)

We can compile it with the futhark opencl compiler:

$ futhark opencl --library dotprod.fut

This produces two files in the current directory: dotprod.c and dotprod.h. We can compile dotprod.c to a shared library like this:

$ gcc dotprod.c -o libdotprod.so -fPIC -shared

We can now link to libdotprod.so the same way we link with any other shared library. But before we get that far, let’s take a look at (parts of) the generated dotprod.h file. We have written the code generator to produce as simple header files as possible, with no superfluous crud, in order to make them human-readable. This is particularly useful at the moment, since few explanatory comments are inserted in the header file.

The first declarations are related to initialisation, which is based on first constructing a configuration object, which can then be used to obtain a context. The context is used in all subsequent calls, and contains GPU state and the like. We elide most of the functions for setting configuration properties, as they are not very interesting:

/*
 * Initialisation
*/

struct futhark_context_config ;

struct futhark_context_config *futhark_context_config_new();

void futhark_context_config_free(struct futhark_context_config *cfg);

void futhark_context_config_set_device(struct futhark_context_config *cfg,
                                       const char *s);

...

struct futhark_context ;

struct futhark_context *futhark_context_new(struct futhark_context_config *cfg);

void futhark_context_free(struct futhark_context *ctx);

int futhark_context_sync(struct futhark_context *ctx);

The above demonstrates a pervasive design decision in the API: the use of pointers to opaque structs. The struct futhark_context is not given a definition, and the only way to construct it is via the function futhark_context_new(). This means that we cannot allocate it statically, which is contrary to how one would normally design a C library. The motivation behind this design is twofold:

  1. It keeps the header file readable, as it elides implementation details like struct members.

  2. It is easier to use from FFIs. Most FFIs make it very easy to work with functions that only accept and produce pointers (and primitive types), but accessing and allocating structs is a little more involved.

The disadvantage is a little more boilerplate, and a little more dynamic allocation. However, relatively few objects of this kind are used, so the performance impact should be nil.

The next part of the header file concerns itself with arrays - how they are created and accessed:

/*
 * Arrays
*/

struct futhark_i32_1d ;

struct futhark_i32_1d *futhark_new_i32_1d(struct futhark_context *ctx,
                                          int32_t *data,
                                          int dim0);

int futhark_free_i32_1d(struct futhark_context *ctx,
                        struct futhark_i32_1d *arr);

int futhark_values_i32_1d(struct futhark_context *ctx,
                          struct futhark_i32_1d *arr,
                          int32_t *data);

int64_t *futhark_shape_i32_1d(struct futhark_context *ctx,
                              struct futhark_i32_1d *arr);

Again we see the use of pointers to opaque structs. We can use futhark_new_i32_1d to construct a Futhark array from a C array, and we can use futhark_values_i32_1d to read all elements from a Futhark array. The representation used by the Futhark array is intentionally hidden from us - we do not even know (or care) whether it is resident in CPU or GPU memory. The code generator automatically generates a struct and accessor functions for every distinct array type used in the entry points of the Futhark program.

The single entry point is declared like this:

int futhark_entry_main(struct futhark_context *ctx,
                       int32_t *out0,
                       const struct futhark_i32_1d *in0,
                       const struct futhark_i32_1d *in1);

As the original Futhark program accepted two parameters and returned one value, the corresponding C function takes one out parameter and two in parameters (as well as a context parameter).

We have now seen enough to write a small C program (with no error handling) that calls our generated library:

#include <stdio.h>

#include "dotprod.h"

int main() {
  int x[] = { 1, 2, 3, 4 };
  int y[] = { 2, 3, 4, 1 };

  struct futhark_context_config *cfg = futhark_context_config_new();
  struct futhark_context *ctx = futhark_context_new(cfg);

  struct futhark_i32_1d *x_arr = futhark_new_i32_1d(ctx, x, 4);
  struct futhark_i32_1d *y_arr = futhark_new_i32_1d(ctx, y, 4);

  int res;
  futhark_entry_main(ctx, &res, x_arr, y_arr);
  futhark_context_sync(ctx);

  printf("Result: %d\n", res);

  futhark_free_i32_1d(ctx, x_arr);
  futhark_free_i32_1d(ctx, y_arr);

  futhark_context_free(ctx);
  futhark_context_config_free(cfg);
}

We hard-code the input data here, but we could just as well have read it from somewhere. The call to futhark_context_new() is where the GPU is initialised (is applicable) and OpenCL kernel code is compiled and uploaded to the device. This call might be relatively slow. However, subsequent calls to entry point functions (futhark_dotprod()) will be efficient, as they re-use the already initialised context.

Note the use of futhark_context_sync() after calling the entry point: Futhark does not guarantee that the final results have been written until we synchronise explicitly. Note also that we free the two arrays x_arr and y_arr once we are done with them - memory management is entirely manual.

If we save this program as luser.c, we can compile and run it like this:

$ gcc luser.c -o luser -lOpenCL -lm -ldotprod
$ ./luser
Result: 24

You may need to set LD_LIBRARY_PATH=. before the dynamic linker can find libdotprod.so. Also, this program will only work if the default OpenCL device is usable on your system, since we did not request any specific device. For testing on a system that does not support OpenCL, simply use futhark c instead of futhark opencl. The generated API will be the same.

5.3. Handling Awkward Futhark Types

Our dot product function uses only types that map easily to NumPy and C: primitives and arrays of primitives. But what happens if we have an entry point that involves abstract types with hidden definitions, or types with no clear analogue in C, such as records or arrays of tuples? In this case, the generated API defines structs for opaque types that support very few operations.

Consider the following contrived program, pack.fut, which contains two entry points:

entry pack (xs: []i32) (ys: []i32): [](i32,i32) = zip xs ys

entry unpack (zs: [](i32,i32)): ([]i32,[]i32) = unzip zs

The pack function turns two arrays into one array of pairs, and the unpack function reverses the operation. If compiled to Python, the pack function will return a special “opaque” object whose contents cannot be inspected. If compiled to C, pack.h contains the following definitions:

struct futhark_opaque_z31U814583239044437263 ;

int futhark_free_opaque_z31U814583239044437263(struct futhark_context *ctx,
                                               struct futhark_opaque_z31U814583239044437263 *obj);

int futhark_pack(struct futhark_context *ctx,
                 struct futhark_opaque_z31U814583239044437263 **out0,
                 struct futhark_i32_1d *in0,
                 struct futhark_i32_1d *in1);

int futhark_unpack(struct futhark_context *ctx,
                   struct futhark_i32_1d **out0,
                   struct futhark_i32_1d **out1,
                   struct futhark_opaque_z31U814583239044437263 *in0);

The unfortunately named struct, futhark_opaque_z31U814583239044437263, represents an array of tuples. There is nothing we can do with it except for freeing it, or passing it back to an entry point. In fact, the name is not even stable - it’s a hash of the internal representation. If you try the above example, you may see a different name.

Opaque types typically occur when you are writing a Futhark program that keeps some kind of state that you don’t want the user modifying or reading directly, but you need access to for each call to an entry point. Since Futhark programs are purely functional (and therefore stateless), having the user to manually pass back the state returned by the previous call is the only way to accomplish this. Fortunately, we can assign these opaque types somewhat more readable names by type abbreviations:

type~ array_of_pairs = [](i32,i32)

entry pack (xs: []i32) (ys: []i32): array_of_pairs = zip xs ys

entry unpack (zs: array_of_pairs): ([]i32,[]i32) = unzip zs

Now, when compiled to C, we obtain a somewhat more readable name for the opaque type:

struct futhark_opaque_array_of_pairs ;

int futhark_free_opaque_array_of_pairs(struct futhark_context *ctx,
                                       struct futhark_opaque_array_of_pairs *obj);

int futhark_entry_pack(struct futhark_context *ctx,
                       struct futhark_opaque_array_of_pairs **out0, const
                       struct futhark_i32_1d *in0, const
                       struct futhark_i32_1d *in1);

int futhark_entry_unpack(struct futhark_context *ctx,
                         struct futhark_i32_1d **out0,
                         struct futhark_i32_1d **out1, const
                         struct futhark_opaque_array_of_pairs *in0);

We have to be careful to use the type abbreviation everywhere, as the compiler will generate the hash-named opaque in any place that we miss.