# 6. Fusion and List Homomorphisms¶

In this chapter, we outline the general SOAC reasoning principles that lie behind both the philosophy of programming with arrays in Futhark and the techniques used for allowing certain programs to have efficient parallel implementations. We shall discuss the reasoning principles in terms of Futhark constructs but introduce a few higher-order concepts that are important for the reasoning.

We first discuss the concept of *fusion*, which aims at eliminating
intermediate arrays while still allowing the Futhark programmer to
express an algorithm using simple SOACs and their associated reasoning
principles.

We then introduce the concept of list homomorphism through a few examples.

## 6.1. Fusion¶

Fusion aims at reducing the overhead of unnecessary repeated
control-flow or unnecessary temporary storage. In essence, fusion is
defined in terms of a number of *fusion rules*, which specify how a
Futhark (intermediate) expression can be transformed into a
semantically equivalent expression.

The rules make use of the auxiliary higher-order functions for, for instance, function composition, presented in Section 2.6.

The *first fusion rule*, \(F1\), which says that the result of
mapping an arbitrary function `f`

over the result of mapping another
arbitrary function `g`

over some array `a`

is identical to mapping
the composed function `f <-< g`

over the array `a`

. The first
fusion rule is also called map-map fusion and can simply be written

`map f <-< map g`

=`map (f <-< g)`

Given that `f`

and `g`

denote the Futhark functions `\x -> e`

and `\y -> e'`

, respectively (possibly after renaming of bound
variables), the *function product* of `f`

and `g`

, written ```
f <*>
g
```

, is defined as `\(x,y) -> (f x, g y)`

.

Now, given functions `f:a->b`

and `g:a->c`

, the *second fusion
rule*, \(F2\), which denotes horizontal fusion, is given by the
following equation:

`(map f <*> map g) <-< dup`

=`map ((f <*> g) <-< dup)`

Here `dup`

is the Futhark function `\x -> (x,x)`

.

The fusion rules that we have presented here generalise to functions that take multiple arguments by applying zipping, unzipping, currying, and uncurrying strategically. Notice that due to Futhark’s strategy of automatically transforming arrays of tuples into tuples of arrays, the applications of zipping, unzipping, currying, and uncurrying have no effect at runtime.

Futhark applies a number of other fusion rules, which are based on the
fundamental property that Futhark’s internal representation is based
on a number of composed constructs (e.g., named `scanomap`

and
`redomap`

). These constructs turn out to fuse well with `map`

.

## 6.2. Parallel Utility Functions¶

For use by other algorithms, a set of utility functions for manipulating and managing arrays is an important part of the tool box. We present a number of utility functions here, ranging from finding elements in an array to finding the maximum element and its index in an array.

### 6.2.1. Finding the Index of an Element in an Array¶

We device two different functions for finding an index in an array for
which the content is identical to some given value. The first
function, `find_idx_first`

, takes a value `e`

and an array `xs`

and returns the smallest index `i`

into `xs`

for which ```
xs[i] =
e
```

:

```
-- Return the first index i into xs for which xs[i] == e
def find_idx_first [n] (e:i32) (xs:[n]i32) : i32 =
let es = map2 (\x i -> if x==e then i else n) xs (iota n)
let res = reduce i32.min n es
in if res == n then -1 else res
```

The second function, `find_idx_last`

, also takes a value and an
array but returns the largest index `i`

into `xs`

for which
`xs[i] = e`

:

```
-- Return the last index i into xs for which xs[i] == e
def find_idx_last [n] (e:i32) (xs:[n]i32) : i32 =
let es = map2 (\x i -> if x==e then i else -1) xs (iota n)
in reduce i32.max (-1) es
```

The above two functions make use of the auxiliary functions
`i32.max`

and `i32.min`

.

### 6.2.2. Finding the Largest Element and its Index in an Array¶

Futhark allows for reduction operators to take tuples as arguments. This feature is exploited in the following function, which implements a homomorphism for finding the largest element and its index in an array:

```
-- Find the largest integer and its index in an array
def MININT : i32 = -10000000
def mx (m1:i32,i1:i32) (m2:i32,i2:i32) : (i32,i32) =
if m1 > m2 then (m1,i1) else (m2,i2)
def maxidx [n] (xs: [n]i32) : (i32,i32) =
reduce mx (MININT,-1) (zip xs (iota n))
```

The function is a *homomorphism* [Bir87]: For any \(x\)
and \(y\), and with \(++\) denoting array concatenation, there
exists an associative operator \(\oplus\) such that

The operator \(\oplus = \kw{mx}\). We will leave it up to the
reader to verify that the `maxidx`

function will operate efficiently
on large inputs.

## 6.3. Radix Sort Revisited¶

A simple radix sort algorithm was presented already in Section 5.6.1. In this section, we present two generalized versions of radix sort, one for ascending sorting and one for descending sorting. As a bonus, the sorting routines return both the sorted array and an index array that can be used to sort an array with respect to a permutation obtained by sorting another array. The generalised ascending radix sort is as follows:

```
-- Store elements for which bitn is not set first
def rs_step_asc [n] ((xs:[n]u32,is:[n]i32),bitn:i32) : ([n]u32,[n]i32) =
let bits1 = map (\x -> (i32.u32 (x >> u32.i32 bitn)) & 1) xs
let bits0 = map (1-) bits1
let idxs0 = map2 (*) bits0 (scan (+) 0 bits0)
let idxs1 = scan (+) 0 bits1
let offs = reduce (+) 0 bits0 -- store idxs1 last
let idxs1 = map2 (*) bits1 (map (+offs) idxs1)
let idxs = map (\x->x-1) (map2 (+) idxs0 idxs1)
in (scatter (copy xs) idxs xs,
scatter (copy is) idxs is)
-- Radix sort - ascending
def rsort_asc [n] (xs: [n]u32) : ([n]u32,[n]i32) =
let is = iota n
in loop (p : ([n]u32,[n]i32)) = (xs,is) for i < 32 do
rs_step_asc(p,i)
```

And the descending version as follows:

```
-- Store elements for which bitn is set first
def rs_step_desc [n] ((xs:[n]u32,is:[n]i32),bitn:i32) : ([n]u32,[n]i32) =
let bits1 = map (\x -> (i32.u32 (x >> u32.i32 bitn)) & 1) xs
let bits0 = map (1-) bits1
let idxs1 = map2 (*) bits1 (scan (+) 0 bits1)
let idxs0 = scan (+) 0 bits0
let offs = reduce (+) 0 bits1 -- store idxs0 last
let idxs0 = map2 (*) bits0 (map (+offs) idxs0)
let idxs = map (\x->x-1) (map2 (+) idxs1 idxs0)
in (scatter (copy xs) idxs xs,
scatter (copy is) idxs is)
-- Radix sort - descending
def rsort_desc [n] (xs: [n]u32) : ([n]u32,[n]i32) =
loop (p : ([n]u32,[n]i32)) = (xs,iota n) for i < 32 do
rs_step_desc(p,i)
```

Notice that in case of identical elements in the source vector, one cannot simply implement the ascending version by reversing the arrays resulting from calling the descending version.

## 6.4. Finding the Longest Streak¶

In this section, we shall demonstrate how to write a function for finding the longest streak of increasing numbers. Here is one possible implementation of the function:

```
-- Longest streak of increasing numbers
def streak [n] (xs: [n]i32) : i32 =
-- find increments
let ys = rotate 1 xs
let is = (map2 (\x y -> if x < y then 1 else 0) xs ys)[0:n-1]
-- scan increments
let ss = scan (+) 0 is
-- nullify where there is no increment
let ss1 = map2 (\s i -> s*(1-i)) ss is
let ss2 = scan max 0 ss1
-- subtract from increment scan
let ss3 = map2 (-) ss ss2
let res = reduce max 0 ss3
in res
```

The following derivation shows how the algorithm works for a
particular input, namely when `stream`

is given the argument array
`[1,5,3,4,2,6,7,8]`

, in which case the algorithm should return the
value 3:

Variable | |||||||||
---|---|---|---|---|---|---|---|---|---|

`xs` |
= | 1 | 5 | 3 | 4 | 2 | 6 | 7 | 8 |

`ys` |
= | 5 | 3 | 4 | 2 | 6 | 7 | 8 | 1 |

`is` |
= | 1 | 0 | 1 | 0 | 1 | 1 | 1 | |

`ss` |
= | 1 | 1 | 2 | 2 | 3 | 4 | 5 | |

`ss` |
= | 0 | 1 | 0 | 2 | 0 | 0 | 0 | |

`ss2` |
= | 0 | 1 | 1 | 2 | 2 | 2 | 2 | |

`ss3` |
= | 1 | 0 | 1 | 0 | 1 | 2 | 3 | |

`res` |
= | 3 |

In Section 7.1.1 we present a simpler algorithm, which builds directly on the concept of a so-called segmented scan.