À la carte configuration in Clojure APIs

There are two dominant configuration patterns in Clojure libraries. The first is where configuration is provided explicitly via the first argument; here, in Rummage, accessing Amazon’s SimpleDB:

(require '[cemerick.rummage :as sdb])
(def config (sdb/create-client "aws id" "aws secret-key"))
(put-attrs config "demo"
  {::sdb/id "foo" :name "value" :key #{50 60 65}})

The other is where the configuration is defined implicitly, usually using binding and dynamic scope (and sometimes via a set!-style setter for REPL usage convenience); here, in Clutch, accessing CouchDB:

(require '[com.ashafa.clutch :as clutch])
(clutch/with-db "localhost"
  (clutch/put-document {:a 5 :b 6} :id "foo"))

The latter is arguably more common, especially in database libraries; in addition to Clutch, you can see the dynamic pattern in play in java.jdbc‘s with-connection and congomongo‘s with-mongo.

From the perspective of a user (they are us!), I sometimes prefer dynamic scope to avoid verbosity, yet I often like to be explicit about configuration (and therefore, usually the target of my code’s activity) at other times, especially when dynamic scope isn’t appropriate or downright dangerous.  My preferences vacillate depending on what I’m doing, where I’m doing it, and what tools I’m using.  In any case, each library that requires configuration almost always requires that you work with it the way its author intended, so I am left with no joy half the time.

As an author of and contributor to such libraries (including the two mentioned above), perhaps I’m in a position to resolve this dilemma.

Irreconcilable differences

Consider any function that needs to make use of configuration, be it a session, a database connection, etc.  As we’ve seen, there seem to be only two implementation strategies: either take the configuration as an explicit argument, or assume the configuration has been bound dynamically elsewhere (as e.g. with-db does in the Clutch example above):

(defn explicit-save
  [config data]
  ...do something with `data` in/at/with thing described by `config`...)

(def *config* nil)
(defn dynamic-save
  [data]
  ...do something with `data` in/at/with thing described by `*config*`...)

There is no way to unifying these two idioms. The only option is to manually provide additional arities of every single function in my API, delegating as necessary when it is suspected that dynamic scope is being used:

(def *config* nil)
(defn broken-save
  ([data] (broken-save *config* data))
  ([config data]
    ...do something with `data` in/at/with thing described by `config`...)))

In my opinion, this is a no-go from an implementor’s perspective: each additional library function implies extra maintenance for each additional arity, or the prospect of rigging up an alternative defn macro that adds the additional arity “automatically”…which doesn’t work if the API is to support rest or keyword args anywhere.

Not one to be disheartened, I’ve been working on an alternative that is either a proper solution, or a hack potentially even more ill-conceived than said defn macro.

Tasty brew: ns-publics + binding + partial

First, let’s implement our library as simply as possible, which means working with explicit arguments everywhere (you can build dynamic scope on top of simple functions, but it’s damn hard to make functions that depend on dynamic scope appear to do otherwise):

(defn save
  [config data]
  ...do something with `data` in/at/with thing described by `config`...)

Now, let’s think about what *config* really represents in prior examples: it’s an implicit indication of the scope of an operation.  We can get a similar effect using partial, which returns a new function that will use the provided arguments as the first arguments to the original function; using it, we can call (a derivative of) our save function with a single argument (our data), making the configuration “implicit” again:

((partial save {:some :configuration})
 {:some :data})

That’s hardly a syntactic improvement over explicitly passing our configuration value explicitly.  However, what if we had a with-config macro that performed this partial evaluation for us, supplying the configuration value to each of our library’s functions so that, within the with-config macro’s scope, each of those functions could be called sans configuration?  Well, we have macros and a reified dynamic environment, so let’s have at it:

(def public-api (vals (ns-publics *ns*)))

(defmacro with-config
  [config & body]
  `(with-bindings (into {} (for [var @#'your.library.ns/public-api]
                           [var (partial @var config)]))
     ~@body))

Explanation is surely in order.  First, we need define our public API; this is just a seq of the public vars in our library’s namespace (which need to be dynamic since we’re going to be rebinding all of them; make sure to use ^:dynamic metadata on them if you’re using Clojure 1.3.0+).

(It seems sane to me that this seq should be filtered based on other metadata to ensure that only those functions that take configuration as their first argument are included.  An example of this is below.)

Second, all our with-config macro does is set up a dynamic scope, binding to each of our library’s vars new functions with the provided configuration partially applied.  Within that scope, we can omit any further reference to the configuration value, even though the foundational implementations of our library’s functions require explicit configuration.

Here’s a complete example (which requires Clojure 1.3.0 because of the ^: metadata notation — porting to Clojure 1.2.0 is simple enough, and left as an exercise):

(ns example)
(defn ^:api ^:dynamic save
  [config data]
  (println (format "Saved %s with %s" data config)))

(def public-api (->> (ns-publics *ns*)
                  vals
                  (filter (comp :api meta))
                  doall))
(defmacro with-config
  [config & body]
  `(with-bindings (into {} (for [var @#'example/public-api]
                             [var (partial @var ~config)]))
     ~@body))

The save function takes configuration explicitly; also, I’ve added ^:api to its var’s metadata so our public-api seq of vars can be filtered of vars that shouldn’t be affected by with-config‘s dynamic scope.  Now our library can support both explicit and dynamic specification of configuration, yet we never really thought at all about the dynamic case when implementing the library:

=> (save {:some :config} {:a 5 :b 6})
Saved {:a 5, :b 6} with {:some :config}
nil
=> (with-config {:some :config}
     (save {:a 5 :b 6}))
Saved {:a 5, :b 6} with {:some :config}
nil

Fin?

I love the flexibility this approach affords the user (usually me!), with, in relative terms, minor effort on the part of the library author.  I’m enough of a fan of it that I’m using it in Clutch (hopefully to be released soon as part of v0.3.0).

However, I should say that I’m not yet entirely at ease:

  1. If your implementation of one public API function calls another, and the vars of both are being rebound by with-config(or its equivalent), then that intra-library function call is going to route through the var and get the function that already has the configuration value partially applied.My solution to this at the moment is to (ack!) use a defn-wrapping macro to define each public function in Clutch that pushes each definition into a closure containing all of the already-defined functions.  This keeps intra-library calls from getting mixed up in the dynamic scope that Clutch’s version of with-config might set up.
  2. Except for the results from one (likely errant) REPL session, I believe that self-calls never route back through the var named in function position.  However, if there is any case where that’s not true (i.e. if self-calls do route through the named var), then self-calls would have the same problem (but not the same — or any — solution) as (1).
  3. Using something like with-config, you’re looking at N partial invocations, N function instantiations, and N dynamic bindings for your library with N public functions, versus 0, 0, and 1 for dynamic configuration APIs that bind a single *config* var.  Insofar as all of the libraries that use this pattern that I know of are database and/or IO-related, this “overhead” can likely be discounted.  In any case, if you don’t want any overhead, with-config gives you the option of no dynamic scope at all.

If you have any comments, warnings, or rants about how this is evil, do share.

About these ads
This entry was posted in Clojure. Bookmark the permalink.

26 Responses to À la carte configuration in Clojure APIs

  1. Jim says:

    Hey Chas. Are you familiar with the state monad?

    • Alex Miller says:

      Indeed, we have experimented with using monads to solve this problem in some places. We did not like it better enough to keep it though.

    • Chas Emerick says:

      Vaguely. Insofar as each time I’ve learned about monads I’ve thought “oh, is that all there is to them?”, I’ve not really internalized what each one is.

      A slight googling and some prodding in irc reminds though…presumably, we’re talking about immutable state, in which case only the reader monad is applicable, correct? (Leaving aside the state of the database or other thing we’re providing configuration for.) Unless I’m really off-track, that would make things somewhat more onerous for consumers of the API (insofar as either higher-level library authors or direct users would end up having to grapple with the monadic underbelly of things).

      Do a dullard a favor and let know if I’m in the weeds.

  2. Slightly evil, in particular points (1) and (3). If you can do it without rebinding any functions, I’ll like it. I think I’d prefer to have two distinct namespaces that provide the same API, where all the functions in one use dynamic config and wrap the functions in the other. Using an approach similar to this one, you could generate the “dynamic” ns automatically.

  3. llasram says:

    For “unease” (3) I agree that in many cases the overhead this would introduce doesn’t matter, but you do still end up paying plenty of overhead even *without* calling with-config — both the performance cost of calling functions through :dynamic vars and the cost of argument wrapping/unwrapping by the closure introduced by unease (1).

    Your unease (1) by itself is enough to put me pretty off this approach personally. This boils down to changing calling conventions with dynamic scope. The def-wrapping macro provides a consistent worldview within the module, but any attempts to build other APIs on top of base APIs defined this way have exactly the same problem, and would need a similar solution. Additionally, users of APIs defined this way have no way of accessing both calling conventions at once, in particular no way of accessing the config-as-argument convention once within the dynamic scope of a with-config.

    Instead of dynamically overriding the API function vars, why not simply automatically define parallel versions which bind their first argument to *config*? The alternate versions could even live in a different namespace, keeping their local names the same. Users could import whichever version of the API they needed, or both of necessary.

    • Chas Emerick says:

      Good point about the “solution” to (1) being a local one only. binding does introduce a similar weakness, insofar as an external library (or its users) would need to properly manage dynamic scope — but at least it’s a known weakness, as opposed to (1)’s novel weakness.

      The separate-dynamic-API-namespace idea seems to have some popular currency (see Stuart Sierra’s comment elsewhere).

      • llasram says:

        I really should learn to reload pages I’ve had open for a while before commenting :-). But yeah, total +1 to Sierra’s separate-namespace proposal.

        However, I don’t think it’s fair to say that other uses of binding have this same weakness. My mental model for thinking about dynamic vars is as extra function arguments which are implicitly carried from where provided to where used. Just like any nonmal function argument, there needs to be some sort of contract (implicit or explicit) about what the value of that argument is. If the contract is “this argument is a URI object,” then passing in an integer obviously leads to undefined behavior, but that’s the problem of whoever passed in an integer. The approach you’ve described “passes in” a function which must be called one of two entirely different ways, but makes no provision for distinguishing between the two ways. Setting another var to hold the current API-calling convention would at least get rid of that ambiguity, but then using the API differently based on the state doesn’t sound so pretty either…

  4. Alex Miller says:

    Interesting idea Chas. I was not sure whether this technique would make it harder to traverse function references in something like Emacs/slime and indeed it does not – because it’s the same function, everything works just as you’d like.

  5. Gary says:

    A twist on the defn-macro idea you initially dismissed could get around the vararg limitation by using a wrapper-fn that checks if *config* has been bound, and if so, pushes it into the args. The messy point is for inter-library calls where you have to avoid accidentally pushing a second config into the args. The messy solution is to check with identical? if it’s already there. Sorta like this:

    Is there anything I’m overlooking?

    • Chas Emerick says:

      Thanks for this, Gary. I was this || close to exactly that approach early on, but didn’t go all the way with it — and then I sank more time than I’d like to admit on all of the above.

      I’ve implemented your suggestion in clutch HEAD; we’ll see how it pans out.

      • Gary says:

        That’s interesting because I assumed that Laurent Petit’s comment also applied to my idea — i.e., if (with-config) is used in a nested manner then you’ll end up with 2 configs pushed in (since identical? will return false). I haven’t put too much thought into whether or not there’s a simple enough fix.

        • Chas Emerick says:

          No, your example doesn’t have the same problem. What I blogged was unconditionally jamming the provided configuration as the first (next, in the nested case!) argument on all of the API’s functions. Your example only conditionally uses the dynamically-bound configuration, essentially on the first call into the API, so configuration from nested contexts won’t “stack up”.

  6. Correct me if I’m wrong, but doesn’t your solution prevent using with-config more than once in the same call stack ? (e.g. if it happens so, then the second time, *config* will be bound to the second argument -if any- of the initial var’s value ?)

    • Chas Emerick says:

      You are correct — good eye. This is definitely the most damning critique of what I described IMO, since I’m flagrantly breaking expectations of how dynamic scoping works in general.

      It’s fixable (give partial the root value of each var instead of the current deref‘d value), but I think that might be a bridge too far.

    • Kevin Downey says:

      the solution is to have named configs and have the configured functions have two arities, one that uses the most reset with-config and another that takes the name of a config to use

  7. Maybe I am just partial to optional name parameters but my, perhaps naive, solution would have been to move the config parameter to the end do something like this:

    (def ^:dynamic *config* nil)

    (defn save [data }]
    (println (format “Saved %s with %s” data config)))

    => (save {:a 5 :b 6} :config {:some :config})
    Saved {:a 5, :b 6} with {:some :config}
    nil
    => (binding [*config* {:some :config}] (save {:a 5 :b 6}))
    Saved {:a 5, :b 6} with {:some :config}
    nil

    You could then address some of the boiler plate with macros as you saw fit.

    • You beat me to it. I was about to propose the same.

      Although there are some questions about the API design ([& rest] etc.), I would also think, that keyword arguments are quite suitable to convey optional configuration parameters.

      Furthermore this approach does not share any of the mentioned drawbacks, AFAICT.

  8. *Sorry wordpress ate my previous code example, here is a gist.

    Maybe I am just partial to optional named parameters but my, perhaps naive, solution would have been to move the config parameter to the end and do something like this:

    You could then address some of the boiler plate with macros as you saw fit.

    • I like that the most. I suppose the non-config parameters of a function are the more interesting ones, thus having the config at the end improves legibility.

      • Gary says:

        But like chas said, that prohibits using varargs for anything else.

        • Which might or might not be a problem for a given API.

          Putting up different namespaces with the same functionality and putting the load on the user to know when to use which (+ added noise of additional namespace aliases in user code) is not to be taken lightly, IMHO. In the general case I would try very hard to get a reasonable API with keyword args before being too clever with namespace generation.

          But maybe this is just a straw man and you go with only one style anyway. YMMV.

          • jedahu says:

            To avoid the verbosity of constantly providing a config parameter you could create a new function in the current namespace:

            (def foo (partial library/foo config))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s