Be Mindful of Clojure’s binding

Clojure‘s binding form is amazingly useful, but as with any very long length of rope, you can hang yourself in a cinch with it. So, let’s review a couple of traps that I’ve personally fallen into while using binding of which you should be aware.

Binding is thread-local

This is super-simple, and it’s the first thing that one learns upon encountering binding for the first time, but you can get bitten by sloppily thinking that an established binding will migrate to another thread, or by not understanding the concurrency semantics of a function you’re calling within your binding form. Consider:

user=> (def *foo* 5)
#'user/*foo*
user=> (defn adder
         [param]
         (+ *foo* param))
#'user/adder
user=> (binding [*foo* 10]
         (doseq [v (pmap adder (repeat 3 5))]
           (println v)))
10
10
10
nil

So, we have a var *foo* holding a default value, and a function adder that just adds its argument to the current thread-local value of *foo*, returning the result. This is obviously just illustrative; you can assume that adder is a function call into an opaque library you’re using that takes some arguments and perhaps pulls some configuration or other data from the values bound into some var it specifies as being part of its API.

The problem here is that adder is being invoked in threads other than the thread that is establishing the binding on *foo*; therefore, the value of *foo* within adder is always the default, 5.

The lesson? Bindings do not migrate across thread boundaries. One of the great things about Clojure is you can “do concurrency” using a variety of easy-to-use primitives (e.g. pmap is absolutely the cat’s nuts, in that it’s a dead-simple way to almost-transparently parallelize computation over a dataset). The ironic downside to that is that whereas thread boundaries are painfully obvious in other languages because of all the ceremony one needs to go through to get results, things like pmap have so little ceremony that it’s easy to forget the basics.

One solution to the problem illustrated above would be to change the implementation of adder so that it’s explicitly capturing the bound value of *foo*, and returning a new function that does the adding using that binding:

user=> (defn make-adder
         []
         (let [foo-value *foo*]
           #(+ foo-value %)))
#'user/make-adder
user=> (binding [*foo* 10]
         (doseq [v (pmap (make-adder) (repeat 3 5))]
           (println v)))
15
15
15
nil

Parenthetically, it’s very much worth noting that all of the wonderful ref/transaction machinery in Clojure is implemented using thread-local bindings. That means that if you try to pmap a function across some set of refs in the course of a transaction (or otherwise attempt to poke at refs in a concurrent environment), things will go very wrong for you. There are ways around this, but they (last I checked) involve manually copying the thread-local bindings associated with any running transaction across thread boundaries – in general, it’s not worth the hassle.

Lazy seqs often escape the scope of binding forms, so capture the value of any bound vars you care about explicitly

As wonderful as lazy sequences are, how and when they dereference bound vars isn’t always obvious, and is entirely dependent upon how and when those lazy sequences are used/materialized. Consider, assuming *foo* is bound to 5 by default as in our first example:

user=> (defn some-fn
         []
         (lazy-seq [*foo*]))
#'user/some-fn
user=> (binding [*foo* 10]
         (some-fn))
(5)

What’s going on here? The lazy-seq macro returns a lazy sequence, which will evaluate the sequence-producing form provided to it on demand – in this case, after the binding form has returned, therefore ensuring that *foo* has reverted to its default value.

This may become clearer with this example:

user=> (binding [*foo* 10]
         (doall (some-fn)))
(10)

doall forces the full evaluation of a lazy sequence – and in this case, because that evaluation is being performed within the binding form, *foo* and the returned sequence is found to have the value we expect.

These are obviously simplistic examples; the real-world scenario that this applies to is where you might be writing a library, and part of that library’s public API are some number of bindable vars that callers can use to configure the behaviour of the library’s functions, etc. This is super-useful, especially for libraries where there are a ton of knobs and levers: rather than forcing callers to provide a configuration object on every function call (and therefore forcing you to thread that configuration through all helper functions, etc), using bindings for such things allows callers to only change the defaults they care about, and allows you to code the implementation of the library in a straightforward way.

The lesson? If you are going to use bound values of vars, you need to make sure you capture those bindings before returning any lazy seqs that use those bound values. Aside from using doall as shown above (which defeats the point of using lazy seqs), the solution looks a lot like the make-adder function from the first section (notice a trend?):

user=> (defn some-fn
         []
         (let [foo-val *foo*]
           (lazy-seq [foo-val])))
#'user/some-fn
user=> (binding [*foo* 10]
         (some-fn))
(10)

Notice that some-fn is now explicitly capturing the bound value of the *foo* var; this ensures that, regardless of when and where or on which thread the lazy seq is materialized, the values it contains are what were bound by the caller of some-fn. This is almost always what you want to have happen.

Too many do not fully realize the degree of flexibility that vars and binding provide to the capable programmer. As is often the case though, power comes with responsibility, and whether one is writing libraries, using them, or casually using binding in localized ways in application code, it needs to be handled with care.

About these ads
This entry was posted in Clojure. Bookmark the permalink.

7 Responses to Be Mindful of Clojure’s binding

  1. I just tried this in 1.4-SNAPSHOT and I get 15:

    user> (def ^:dynamic *foo* 5)
    #’user/*foo*

    user> (defn adder [param]
    (+ *foo* param))
    #’user/adder

    user> (binding [*foo* 10]
    (doseq [v (pmap adder (repeat 3 5))]
    (println v)))
    15
    15
    15
    nil
    user>

    I’m not sure when this changed, of if something else is going on here.

  2. FWIW In 1.4 I still see the behavior Chas mentioned with lazy-seqs

  3. Matthew W says:

    Thanks, this was helpful after I spent way too long tracking down a bug relating to lazy sequences and with-bindings.

    Supports my gut feeling that dynamic scope is a nasty hack which should be reserved for top-level convenience bindings only, and that library authors should always provide a version of their API which doesn’t rely on dynamic scope under the hood. Since as we’ve seen, the abstraction can easily leak and cause subtle and tricky-to-track-down problems for callers who use the API in combination with clojure features like laziness, pmap etc. As a consumer of your API I shouldn’t have to worry about which bits of your internal state I need to capture in a lexical scope in order for things to play nice in these cases.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s