5. Interoperability¶
Futhark is a purely functional high-performance language incapable of interacting with the outside world except through function parameters. This makes it impossible to write full applications in Futhark, except via the limited standard input-based interface that we used in the preceding chapters. In practice, this interface is too slow and too inflexible to be useful. Instead, the Futhark compiler is designed to generate libraries, which can then be invoked by general-purpose languages. In this chapter we will see how to call Futhark from Python and C, with particular attention paid to the former.
5.1. Calling Futhark from Python¶
Python is a language with many qualities, but few would claim that
performance is among them. While libraries such as NumPy can be used,
they are not as flexible as being able to write code directly in a
high-performance language. Unfortunately, writing the
performance-critical parts of a Python program in (say) C is not
always a good experience, and the interfacing between the Python code
and the C code can be awkward and inelegant (although to be fair, it
is still nicer in Python than in many other languages). It would be
more convenient if we could compile a high-performance language
directly to a Python module that we could then import
like any
other piece of Python code. Of course, this entire exercise is only
worthwhile if the code in the resulting Python module executes much
faster than manually written Python. Fortunately, when most of the
computation can be offloaded to the GPU via OpenCL, the Futhark
compiler is capable of this feat.
OpenCL works by having an ordinary program running on the CPU that transmits code and data to the GPU (or any other accelerator, but we’ll stick to GPUs). In the ideal case, the CPU-code is mostly glue that performs bookkeeping and making API calls - in other words, not resource-intensive, and exactly what Python is good at. No matter the language the CPU code is written in, the GPU code will be written in OpenCL C and translated at program initialisation to whatever machine code is needed by the concrete GPU.
This is what is exploited by the PyOpenCL backend in the Futhark compiler. Certainly, the CPU-level code is written in pure Python and quite slow, but all it does is use the PyOpenCL library to offload work to the GPU. The fact that this offloading takes place is hidden from the user of the generated code, who is provided a module with functions that accept and produce ordinary NumPy arrays.
Consider our usual dot product program:
def main (x: []i32) (y: []i32): i32 =
reduce (+) 0 (map2 (*) x y)
We can compile this to a Python module:
$ futhark pyopencl --library dotprod.fut
The result is a file dotprod.py
that we can import from within
Python:
$ python
>>> import dotprod
The dotprod.py
module defines a class dotprod
that we must
instantiate. The class maintains various bits of bookkeeping
information, and exposes a method for every entry point in our program
(here just main
):
>>> o = dotprod.dotprod()
We will get an error if we try to pass Python lists to the entry point, as lists are not arrays:
>>> o.main([1,2,3], [4,5,6])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "dotprod.py", line 2416, in main
x_mem_3884_ext))
TypeError: Argument #0 has invalid value
Futhark type: []i32
Argument has Python type <type 'list'> and value: [1, 2, 3]
Instead, we have to construct a properly typed NumPy array:
>>> import numpy as np
>>> o.main(np.array([1,2,3], dtype=np.int32),
np.array([4,5,6], dtype=np.int32))
32
The integer that is returned is a normal Python object of an
appropriate type (in this case it will have type np.int32
). If an
array is returned, it is in the form of a PyOpenCL array, which is mostly
compatible with NumPy arrays, except that the backing memory still
resides on the GPU, and is not copied over to the CPU unless
necessary. This makes it efficient to take the output of one entry
point and pass it as the input to another. PyOpenCL arrays contain a
.get()
method that can be used to construct an equivalent NumPy
array, if desired.
5.2. Calling Futhark from C¶
Let us once again consider dotprod.fut
:
def main (x: []i32) (y: []i32): i32 =
reduce (+) 0 (map2 (*) x y)
We can compile it with the futhark opencl
compiler:
$ futhark opencl --library dotprod.fut
This produces two files in the current directory: dotprod.c
and
dotprod.h
. We can compile dotprod.c
to a shared library like
this:
$ gcc dotprod.c -o libdotprod.so -fPIC -shared
We can now link to libdotprod.so
the same way we link with any
other shared library. But before we get that far, let’s take a look
at (parts of) the generated dotprod.h
file. We have written the
code generator to produce as simple header files as possible, with no
superfluous crud, in order to make them human-readable. This is
particularly useful at the moment, since few explanatory comments are
inserted in the header file.
The first declarations are related to initialisation, which is based on first constructing a configuration object, which can then be used to obtain a context. The context is used in all subsequent calls, and contains GPU state and the like. We elide most of the functions for setting configuration properties, as they are not very interesting:
/*
* Initialisation
*/
struct futhark_context_config ;
struct futhark_context_config *futhark_context_config_new();
void futhark_context_config_free(struct futhark_context_config *cfg);
void futhark_context_config_set_device(struct futhark_context_config *cfg,
const char *s);
...
struct futhark_context ;
struct futhark_context *futhark_context_new(struct futhark_context_config *cfg);
void futhark_context_free(struct futhark_context *ctx);
int futhark_context_sync(struct futhark_context *ctx);
The above demonstrates a pervasive design decision in the API: the use
of pointers to opaque structs. The struct futhark_context
is
not given a definition, and the only way to construct it is via the
function futhark_context_new()
. This means that we cannot
allocate it statically, which is contrary to how one would normally
design a C library. The motivation behind this design is twofold:
It keeps the header file readable, as it elides implementation details like struct members.
It is easier to use from FFIs. Most FFIs make it very easy to work with functions that only accept and produce pointers (and primitive types), but accessing and allocating structs is a little more involved.
The disadvantage is a little more boilerplate, and a little more dynamic allocation. However, relatively few objects of this kind are used, so the performance impact should be nil.
The next part of the header file concerns itself with arrays - how they are created and accessed:
/*
* Arrays
*/
struct futhark_i32_1d ;
struct futhark_i32_1d *futhark_new_i32_1d(struct futhark_context *ctx,
int32_t *data,
int dim0);
int futhark_free_i32_1d(struct futhark_context *ctx,
struct futhark_i32_1d *arr);
int futhark_values_i32_1d(struct futhark_context *ctx,
struct futhark_i32_1d *arr,
int32_t *data);
int64_t *futhark_shape_i32_1d(struct futhark_context *ctx,
struct futhark_i32_1d *arr);
Again we see the use of pointers to opaque structs. We can use
futhark_new_i32_1d
to construct a Futhark array from a C array,
and we can use futhark_values_i32_1d
to read all elements from a
Futhark array. The representation used by the Futhark array is
intentionally hidden from us - we do not even know (or care) whether
it is resident in CPU or GPU memory. The code generator automatically
generates a struct and accessor functions for every distinct array
type used in the entry points of the Futhark program.
The single entry point is declared like this:
int futhark_entry_main(struct futhark_context *ctx,
int32_t *out0,
const struct futhark_i32_1d *in0,
const struct futhark_i32_1d *in1);
As the original Futhark program accepted two parameters and returned one value, the corresponding C function takes one out parameter and two in parameters (as well as a context parameter).
We have now seen enough to write a small C program (with no error handling) that calls our generated library:
#include <stdio.h>
#include "dotprod.h"
int main() {
int x[] = { 1, 2, 3, 4 };
int y[] = { 2, 3, 4, 1 };
struct futhark_context_config *cfg = futhark_context_config_new();
struct futhark_context *ctx = futhark_context_new(cfg);
struct futhark_i32_1d *x_arr = futhark_new_i32_1d(ctx, x, 4);
struct futhark_i32_1d *y_arr = futhark_new_i32_1d(ctx, y, 4);
int res;
futhark_entry_main(ctx, &res, x_arr, y_arr);
futhark_context_sync(ctx);
printf("Result: %d\n", res);
futhark_free_i32_1d(ctx, x_arr);
futhark_free_i32_1d(ctx, y_arr);
futhark_context_free(ctx);
futhark_context_config_free(cfg);
}
We hard-code the input data here, but we could just as well have read
it from somewhere. The call to futhark_context_new()
is where the
GPU is initialised (is applicable) and OpenCL kernel code is compiled
and uploaded to the device. This call might be relatively slow.
However, subsequent calls to entry point functions
(futhark_dotprod()
) will be efficient, as they re-use the already
initialised context.
Note the use of futhark_context_sync()
after calling the entry
point: Futhark does not guarantee that the final results have been
written until we synchronise explicitly. Note also that we free the
two arrays x_arr
and y_arr
once we are done with them - memory
management is entirely manual.
If we save this program as luser.c
, we can compile and run it like
this:
$ gcc luser.c -o luser -lOpenCL -lm -ldotprod
$ ./luser
Result: 24
You may need to set LD_LIBRARY_PATH=.
before the dynamic linker
can find libdotprod.so
. Also, this program will only work if the
default OpenCL device is usable on your system, since we did not
request any specific device. For testing on a system that does not
support OpenCL, simply use futhark c
instead of
futhark opencl
. The generated API will be the same.
5.3. Handling Awkward Futhark Types¶
Our dot product function uses only types that map easily to NumPy and C: primitives and arrays of primitives. But what happens if we have an entry point that involves abstract types with hidden definitions, or types with no clear analogue in C, such as records or arrays of tuples? In this case, the generated API defines structs for opaque types that support very few operations.
Consider the following contrived program, pack.fut
, which contains
two entry points:
entry pack (xs: []i32) (ys: []i32): [](i32,i32) = zip xs ys
entry unpack (zs: [](i32,i32)): ([]i32,[]i32) = unzip zs
The pack
function turns two arrays into one array of pairs, and
the unpack
function reverses the operation. If compiled to
Python, the pack
function will return a special “opaque” object
whose contents cannot be inspected. If compiled to C, pack.h
contains the following definitions:
struct futhark_opaque_z31U814583239044437263 ;
int futhark_free_opaque_z31U814583239044437263(struct futhark_context *ctx,
struct futhark_opaque_z31U814583239044437263 *obj);
int futhark_pack(struct futhark_context *ctx,
struct futhark_opaque_z31U814583239044437263 **out0,
struct futhark_i32_1d *in0,
struct futhark_i32_1d *in1);
int futhark_unpack(struct futhark_context *ctx,
struct futhark_i32_1d **out0,
struct futhark_i32_1d **out1,
struct futhark_opaque_z31U814583239044437263 *in0);
The unfortunately named struct,
futhark_opaque_z31U814583239044437263
, represents an array of
tuples. There is nothing we can do with it except for freeing it, or
passing it back to an entry point. In fact, the name is not even
stable - it’s a hash of the internal representation. If you try the
above example, you may see a different name.
Opaque types typically occur when you are writing a Futhark program that keeps some kind of state that you don’t want the user modifying or reading directly, but you need access to for each call to an entry point. Since Futhark programs are purely functional (and therefore stateless), having the user to manually pass back the state returned by the previous call is the only way to accomplish this. Fortunately, we can assign these opaque types somewhat more readable names by type abbreviations:
type~ array_of_pairs = [](i32,i32)
entry pack (xs: []i32) (ys: []i32): array_of_pairs = zip xs ys
entry unpack (zs: array_of_pairs): ([]i32,[]i32) = unzip zs
Now, when compiled to C, we obtain a somewhat more readable name for the opaque type:
struct futhark_opaque_array_of_pairs ;
int futhark_free_opaque_array_of_pairs(struct futhark_context *ctx,
struct futhark_opaque_array_of_pairs *obj);
int futhark_entry_pack(struct futhark_context *ctx,
struct futhark_opaque_array_of_pairs **out0, const
struct futhark_i32_1d *in0, const
struct futhark_i32_1d *in1);
int futhark_entry_unpack(struct futhark_context *ctx,
struct futhark_i32_1d **out0,
struct futhark_i32_1d **out1, const
struct futhark_opaque_array_of_pairs *in0);
We have to be careful to use the type abbreviation everywhere, as the compiler will generate the hash-named opaque in any place that we miss.