À la carte configuration in Clojure APIs

There are two dominant configuration patterns in Clojure libraries. The first is where configuration is provided explicitly via the first argument; here, in Rummage, accessing Amazon's SimpleDB:

[code language="Clojure"] (require '[cemerick.rummage :as sdb]) (def config (sdb/create-client "aws id" "aws secret-key")) (put-attrs config "demo" {::sdb/id "foo" :name "value" :key #{50 60 65}}) [/code]

The other is where the configuration is defined implicitly, usually using binding and dynamic scope (and sometimes via a set!-style setter for REPL usage convenience); here, in Clutch, accessing CouchDB:

[code language="Clojure"] (require '[com.ashafa.clutch :as clutch]) (clutch/with-db "localhost" (clutch/put-document {:a 5 :b 6} :id "foo")) [/code]

The latter is arguably more common, especially in database libraries; in addition to Clutch, you can see the dynamic pattern in play in java.jdbc's with-connection and congomongo's with-mongo.

From the perspective of a user (they are us!), I sometimes prefer dynamic scope to avoid verbosity, yet I often like to be explicit about configuration (and therefore, usually the target of my code's activity) at other times, especially when dynamic scope isn't appropriate or downright dangerous.  My preferences vacillate depending on what I'm doing, where I'm doing it, and what tools I'm using.  In any case, each library that requires configuration almost always requires that you work with it the way its author intended, so I am left with no joy half the time.

As an author of and contributor to such libraries (including the two mentioned above), perhaps I'm in a position to resolve this dilemma.

Irreconcilable differences

Consider any function that needs to make use of configuration, be it a session, a database connection, etc.  As we've seen, there seem to be only two implementation strategies: either take the configuration as an explicit argument, or assume the configuration has been bound dynamically elsewhere (as e.g. with-db does in the Clutch example above):

[code language="Clojure"] (defn explicit-save [config data] …do something with data in/at/with thing described by config…)

(def config nil) (defn dynamic-save [data] …do something with data in/at/with thing described by *config*…) [/code]

There is no way to unifying these two idioms. The only option is to manually provide additional arities of every single function in my API, delegating as necessary when it is suspected that dynamic scope is being used:

[code language="Clojure"] (def config nil) (defn broken-save ([data] (broken-save config data)) ([config data] …do something with data in/at/with thing described by config…))) [/code]

In my opinion, this is a no-go from an implementor's perspective: each additional library function implies extra maintenance for each additional arity, or the prospect of rigging up an alternative defn macro that adds the additional arity "automatically"…which doesn't work if the API is to support rest or keyword args anywhere.

Not one to be disheartened, I've been working on an alternative that is either a proper solution, or a hack potentially even more ill-conceived than said defn macro.

Tasty brew: ns-publics + binding + partial

First, let's implement our library as simply as possible, which means working with explicit arguments everywhere (you can build dynamic scope on top of simple functions, but it's damn hard to make functions that depend on dynamic scope appear to do otherwise):

[code language="Clojure"] (defn save [config data] …do something with data in/at/with thing described by config…) [/code]

Now, let's think about what *config* really represents in prior examples: it's an implicit indication of the scope of an operation.  We can get a similar effect using partial, which returns a new function that will use the provided arguments as the first arguments to the original function; using it, we can call (a derivative of) our save function with a single argument (our data), making the configuration "implicit" again:

[code language="Clojure"] ((partial save {:some :configuration}) {:some :data}) [/code]

That's hardly a syntactic improvement over explicitly passing our configuration value explicitly.  However, what if we had a with-config macro that performed this partial evaluation for us, supplying the configuration value to each of our library's functions so that, within the with-config macro's scope, each of those functions could be called sans configuration?  Well, we have macros and a reified dynamic environment, so let's have at it:

[code language="Clojure"] (def public-api (vals (ns-publics ns)))

(defmacro with-config [config & body] `(with-bindings (into {} (for [var @#'your.library.ns/public-api] [var (partial @var config)])) ~@body)) [/code]

Explanation is surely in order.  First, we need define our public API; this is just a seq of the public vars in our library's namespace (which need to be dynamic since we're going to be rebinding all of them; make sure to use ^:dynamic metadata on them if you're using Clojure 1.3.0+).

(It seems sane to me that this seq should be filtered based on other metadata to ensure that only those functions that take configuration as their first argument are included.  An example of this is below.)

Second, all our with-config macro does is set up a dynamic scope, binding to each of our library's vars new functions with the provided configuration partially applied.  Within that scope, we can omit any further reference to the configuration value, even though the foundational implementations of our library's functions require explicit configuration.

Here's a complete example (which requires Clojure 1.3.0 because of the ^: metadata notation — porting to Clojure 1.2.0 is simple enough, and left as an exercise):

[code language="Clojure"] (ns example) (defn ^:api ^:dynamic save [config data] (println (format "Saved %s with %s" data config)))

(def public-api (->> (ns-publics ns) vals (filter (comp :api meta)) doall)) (defmacro with-config [config & body] `(with-bindings (into {} (for [var @#'example/public-api] [var (partial @var ~config)])) ~@body)) [/code]

The save function takes configuration explicitly; also, I've added ^:api to its var's metadata so our public-api seq of vars can be filtered of vars that shouldn't be affected by with-config's dynamic scope.  Now our library can support both explicit and dynamic specification of configuration, yet we never really thought at all about the dynamic case when implementing the library:

[code language="Clojure"] => (save {:some :config} {:a 5 :b 6}) Saved {:a 5, :b 6} with {:some :config} nil => (with-config {:some :config} (save {:a 5 :b 6})) Saved {:a 5, :b 6} with {:some :config} nil [/code]

Fin?

I love the flexibility this approach affords the user (usually me!), with, in relative terms, minor effort on the part of the library author.  I'm enough of a fan of it that I'm using it in Clutch (hopefully to be released soon as part of v0.3.0).

However, I should say that I'm not yet entirely at ease:

  1. If your implementation of one public API function calls another, and the vars of both are being rebound by with-config(or its equivalent), then that intra-library function call is going to route through the var and get the function that already has the configuration value partially applied.My solution to this at the moment is to (ack!) use a defn-wrapping macro to define each public function in Clutch that pushes each definition into a closure containing all of the already-defined functions.  This keeps intra-library calls from getting mixed up in the dynamic scope that Clutch's version of with-config might set up.
  2. Except for the results from one (likely errant) REPL session, I believe that self-calls never route back through the var named in function position.  However, if there is any case where that's not true (i.e. if self-calls do route through the named var), then self-calls would have the same problem (but not the same — or any — solution) as (1).
  3. Using something like with-config, you're looking at N partial invocations, N function instantiations, and N dynamic bindings for your library with N public functions, versus 0, 0, and 1 for dynamic configuration APIs that bind a single *config* var.  Insofar as all of the libraries that use this pattern that I know of are database and/or IO-related, this "overhead" can likely be discounted.  In any case, if you don't want any overhead, with-config gives you the option of no dynamic scope at all.

If you have any comments, warnings, or rants about how this is evil, do share.