#### Clojure and Reduce

*(Note: this post discusses and compares the use of loop and reduce in Clojure. It’s very basic, but if you don’t know Clojure, it’s doubtful that you’ll be able to follow. If you would like to roll up your sleeves and learn the language, check out Clojure for the Brave and True.) *

The Clojure coding samples in this article can be found here.

When you learn Clojure, especially if you are new to functional languages, then the *loop* function is a lifeline: after tussling with the constraints of immutable data, suddenly you have a way to execute the moral equivalent of a for-loop. Life is good again!

But there’s almost always a better way, and in Clojure, *reduce* is often a better way than *loop*.

###### Reduce What?

The action of the reduce function is to take a list of values and collapse them into a single value. For example, computing a sum could be said to reduce a list of numbers to a single number.

And what is perhaps the simplest example of *reduce* does just that:

> (reduce + '(3 2 1 5 4)) ;; 15

What’s going on here?

Well, the arguments to Clojure’s *reduce* are:

- a function that accepts two arguments, the
*plus*function in our example, - an optional initial value, not used in our example,
- a list of values.

*(If an initial value is not supplied, then reduce generates it from executing the function against the first two items in the list.)*

In the example, *reduce* repeatedly executes the *plus* function against the members of list ‘(3 2 1 5 4) as follows:

(+ 3 2) to get 5 (+ 5 1) to get 6, (+ 6 5) to get 11, (+ 11 4) to get 15, which is returned.

Note that when *reduce* calls the *plus* function, the first argument to the plus function is the output from its previous execution, and the second argument is from the current value in the list of values. *E.g.*, after the first execution gave us 5, we plugged that into the *plus* function along with the value 1, which was the next in line from our list of integers.

More generally, when the function is handling argument , it is also given .

Our example does not use the optional *initial value* argument. In all but the simplest cases, however, you ought to supply one, the primary reason being that this allows you to use different types for input and output. That is, the types of and can be different. For example, the function may be given a list of scalars and output a map.

###### Demonstration: converting loop to reduce

To demonstrate replacing *loop* with *reduce*, let’s write a few Clojure functions that calculate the variance of a list of integers. From statistics, the definition of variance is:

In English, the variance is the sum of squares of the difference between each sample value and the mean, divided by the number of samples.

By simple algebra, and remembering that the mean is the sum of divided by , the above can be shown to be equivalent to

The advantage of the second form for calculating variance is that it only requires a single pass through the sample, in which it tallies the sums of and . The original form requires one pass to compute the mean and then a second to compute the sum of squares of differences.

Here is a Clojure function that is shared across our other examples. It returns the variance when given a sample’s sum of squares, sum, and count.

(defn var-from-sums [ss s cnt] (double (/ (- ss (/ (* s s) cnt)) cnt)))

###### First cut: naive recursion

Here is a straight-forward implementation for calculating sum of squares and sum of a sample:

(defn calculate-variance1 [sample] (let [sums (loop [[n & the-rest] sample sumsq 0 sum 0] (if (nil? n) [sumsq sum] (recur the-rest (+ sumsq (* n n)) (+ sum n))))] (var-from-sums (sums 0) (sums 1) (count sample))))

Pretty simple: iterate through the sample, keeping a tally of the sum of squares and the sum of the sample. The *loop* function return vector *[sumsq sum]*, which is bound to *sums*.

###### Take Two: Return a map rather than a vector

Probably the ugliest thing about our first cut is that the loop returns sum-of-squares and sum in a vector. “Magic numbers” are then used to access the sum of squares and the sum. There’s no semantic clue to link the position with the desired value.

A better practice would be to return the result in a map. Clojure maps are inherently self-labeling. And, as we will see in version 3, this allows our function to use two arguments, one holding cumulative results, the other for items from the sample. This will make replacing *loop* with *reduce* very simple.

So here is a slightly modified loop.

(defn calculate-variance2 [sample] (let [sums (loop [[n & the-rest] sample tallies {:sumsq 0 :sum 0}] (if (nil? n) tallies (recur the-rest {:sumsq (+ (:sumsq tallies) (* n n)) :sum (+ (:sum tallies) n)})))] (var-from-sums (:sumsq sums) (:sum sums) (count sample))))

At the expense of some visual clutter in the calculations, the results are more clearly labeled. And by using two bindings, one for iterating, one for accumulating, we have set the stage for using *reduce*.

###### Take Three: Reduce

Now we are ready to replace *loop* with *reduce*. The conversion is essentially a cut-and-paste job: the initial bindings of the *loop* become the second and third arguments to *reduce*.

(defn calculate-variance3 [sample] (let [tallies (reduce (fn [tallies n] {:sumsq (+ (:sumsq tallies) (* n n)) :sum (+ (:sum tallies) n)}) {:sumsq 0 :sum 0} sample)] (var-from-sums (:sumsq tallies) (:sum tallies) (count sample))))

I think you will agree that the intent of this code is more clear than in the earlier examples. And we lose the visual clutter of doing our own iteration across the sample.

###### Wrapping up

The last version of our function is the clearest, with less language mechanism obscuring what is of interest. And it seems to me that it ought to be the most natural approach for someone coming to Clojure from a procedural language: you start with initial values, and you operate on them as you iterate across some range of values.

Icing on the cake is that *reduce* is often faster than *loop*, because of internal support for reduce that some Clojure data structures implement.

Terser, cleaner, faster, more clear. What’s not to like?

I’ve demonstrated an approach for converting your *loop* code to use *reduce* instead. Ideally, you will start using *reduce* in the first place.

In regard to reduce being faster than loop “because of internal support”, see http://kukuruku.co/hub/funcprog/clojure-transducers-reducers-and-other-stuff

“Clojure uses sequences to traverse collections. If we traverse a vector, a hash table, or a simple iterator, a fair amount of temporary objects will be created.

An obvious optimization here is to implement a specialized reduce variant for the collections needing it. If the collection is not suitable for such optimization, it’s better to use the standard implementation, similar to that provided at the beginning of the article. There’s a special clojure.core.protocol/CollReduce protocol for this”

LikeLike