Clojure: prefer reduce to loop

Clojure and Reduce

(Note: this post discusses and compares the use of loop and reduce in Clojure. It’s very basic, but if you don’t know Clojure, it’s doubtful that you’ll be able to follow. If you would like to roll up your sleeves and learn the language, check out Clojure for the Brave and True.) 

The Clojure coding samples in this article can be found here.

When you learn Clojure, especially if you are new to functional languages, then the loop function is a lifeline: after tussling with the constraints of immutable data, suddenly you have a way to execute the moral equivalent of a for-loop.  Life is good again!

But there’s almost always a better way, and in Clojure, reduce is often a better way than loop.

Reduce What?

The action of the reduce function is to take a list of values and collapse them into a single value. For example, computing a sum could be said to reduce a list of numbers to a single number.

And what is perhaps the simplest example of reduce does just that:

> (reduce + '(3 2 1 5 4))
;; 15

What’s going on here?

Well, the arguments to Clojure’s reduce are:

  • a function that accepts two arguments, the plus function in our example,
  • an optional initial value, not used in our example,
  • a list of values.

(If an initial value is not supplied, then reduce generates it from executing the function against the first two items in the list.)

In the example, reduce repeatedly executes the plus function against the members of list ‘(3 2 1 5 4) as follows:

(+ 3  2) to get 5
(+ 5  1) to get 6,
(+ 6  5) to get 11,
(+ 11 4) to get 15, which is returned.

Note that when reduce calls the plus function, the first argument to the plus function is the output from its previous execution, and the second argument is from the current value in the list of values. E.g., after the first execution gave us 5, we plugged that into the plus function along with the value 1, which was the next in line from our list of integers.

More generally, when the function f is handling argument x_n, it is also given f(x_{n - 1}).

Our example does not use the optional initial value argument. In all but the simplest cases, however, you ought to supply one, the primary reason being that this allows you to use different types for input and output. That is, the types of x_i and f(x_i) can be different. For example, the function may be given a list of scalars and output a map.

Demonstration: converting loop to reduce

To demonstrate replacing loop with reduce, let’s write a few Clojure functions that calculate the variance of a list of integers.  From statistics, the definition of variance is:

\frac{\Sigma(x - \mu)^2}{N}

In English, the variance is the sum of squares of the difference between each sample value and the mean, divided by the number of samples.

By simple algebra, and remembering that the mean is the sum of x divided by N, the above can be shown to be equivalent to

\frac{\Sigma(x^2)}{N} - \frac{(\Sigma(x))^2}{N^2}

The advantage of the second form for calculating variance is that it only requires a single pass through the sample, in which  it tallies the sums of x and x^2.  The original form requires one pass to compute the mean and then a second to compute the sum of squares of differences.

Here is a Clojure function that is shared across our other examples. It returns the variance when given a sample’s sum of squares, sum, and count.

(defn var-from-sums
  [ss s cnt]
  (double (/ (- ss (/ (* s s) cnt)) cnt)))
First cut: naive recursion

Here is a straight-forward implementation for calculating sum of squares and sum of a sample:

(defn calculate-variance1
  [sample]
  (let [sums (loop [[n & the-rest] sample
                    sumsq  0 
                    sum    0]
               (if (nil? n)
                 [sumsq sum]
                 (recur the-rest
                        (+ sumsq (* n n))
                        (+ sum   n))))]
    (var-from-sums (sums 0) (sums 1) (count sample))))

Pretty simple: iterate through the sample, keeping a tally of the sum of squares and the sum of the sample. The loop function return vector [sumsq sum], which is bound to sums.

Take Two: Return a map rather than a vector

Probably the ugliest thing about our first cut is that the loop returns sum-of-squares and sum in a vector. “Magic numbers” are then used to access the sum of squares and the sum. There’s no semantic clue to link the position with the desired value.

A better practice would be to return the result in a map. Clojure maps are inherently self-labeling. And, as we will see in version 3, this allows our function to use two arguments, one holding cumulative results, the other for items from the sample.  This will make replacing loop with reduce very simple.

So here is a slightly modified loop.

(defn calculate-variance2
  [sample]
  (let [sums  (loop [[n & the-rest] sample
                     tallies {:sumsq 0 :sum 0}]
                (if (nil? n)
                  tallies
                  (recur the-rest 
                         {:sumsq 
                             (+ (:sumsq tallies) (* n n)) 
                          :sum   
                             (+ (:sum   tallies) n)})))]
    (var-from-sums (:sumsq sums) 
                   (:sum sums) 
                   (count sample))))

At the expense of some visual clutter in the calculations, the results are more clearly labeled. And by using two bindings, one for iterating, one for accumulating, we have set the stage for using reduce.

Take Three: Reduce

Now we are ready to replace loop with reduce. The conversion is essentially a cut-and-paste job: the initial bindings of the loop become the second and third arguments to reduce.

(defn calculate-variance3
  [sample]
  (let [tallies 
          (reduce 
            (fn [tallies n] 
              {:sumsq (+ (:sumsq tallies) (* n n)) 
                :sum   (+ (:sum   tallies) n)})
             {:sumsq 0 :sum 0} 
             sample)]
    (var-from-sums (:sumsq tallies) 
                   (:sum tallies) 
                   (count sample))))

I think you will agree that the intent of this code is more clear than in the earlier examples. And we lose the visual clutter of doing our own iteration across the sample.

Wrapping up

The last version of our function is the clearest, with less language mechanism obscuring what is of interest. And it seems to me that it ought to be the most natural approach for someone coming to Clojure from a procedural language: you start with initial values, and you operate on them as you iterate across some range of values.

Icing on the cake is that reduce is often faster than loop, because of internal support for reduce that some Clojure data structures implement.

Terser, cleaner, faster, more clear. What’s not to like?

I’ve demonstrated an approach for converting your loop code to use reduce instead. Ideally, you will start using reduce in the first place.

Advertisements

One thought on “Clojure: prefer reduce to loop

  1. rstinejr says:

    In regard to reduce being faster than loop “because of internal support”, see http://kukuruku.co/hub/funcprog/clojure-transducers-reducers-and-other-stuff

    “Clojure uses sequences to traverse collections. If we traverse a vector, a hash table, or a simple iterator, a fair amount of temporary objects will be created.

    An obvious optimization here is to implement a specialized reduce variant for the collections needing it. If the collection is not suitable for such optimization, it’s better to use the standard implementation, similar to that provided at the beginning of the article. There’s a special clojure.core.protocol/CollReduce protocol for this”

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s