Writing CouchDB Views using ClojureScript

UPDATE [2012-05-07]: clutch-clojurescript is now deprecated, as its functionality has been rolled into Clutch proper in toto.  Documentation for the feature can be found here.

While I was in San Fransisco for JavaOne, I was lucky enough to be invited to speak at the Bay Area Clojure User Group (thanks, Sean and Toni!).  It was a great time, and gave me the kick in the pants I needed to finish hacking away at my first project involving ClojureScript: using it to write view functions for CouchDB.

The result is clutch-clojurescript, which naturally builds on top of the Clutch library that I’ve collaborated on with Tunde Ashafa for some time now.

My motivations for doing this were manifold:

  1. I quite enjoy using CouchDB, as its model and general philosophy meshes very naturally with my (and my tools’) disposition and the data I work with most often.
  2. The operational hassle associated with maintaining a Clojure view server (which Clutch provides) configuration alongside my CouchDB installs was always a hassle.
  3. I’ve been wanting to do more and more with Cloudant, but a Clojure view server is just not an option with a hosted database-as-a-service like that.
  4. I can’t stand writing JavaScript.  Give me the reach of JavaScript, but with sane abstractions, homoiconicity (macros!), and data structures that aren’t braindead? Sign me up.

Feel free to go check out clutch-clojurescript: beat on it some, and let me know if it breaks on you.  I would eventually like to fold it into Clutch proper.  Beware some limitations though — repeated here from the README in part to draw attention to them:

  • ClojureScript is not yet available as a proper library. This forces me to include some binary version of it in this git repo (a hefty 8.3MB!…which includes various google JavaScript UI bits that I’d hope would be broken out eventually), and bundle the necessary bits into the clutch-clojurescript jar. I would very much like to roll clutch-clojurescript’s functionality into Clutch proper, but I’ll not do so until the latter can rely upon a ClojureScript dependency.
  • ClojureScript / Google Closure produces a very large code footprint, even for the simplest of view functions. This is apparently an item of active development in ClojureScript.In any case, the code size of a view function string should have little to no impact on runtime performance of that view. The only penalty to be paid should be in view server initialization, which should be relatively infrequent. Further, the vast majority of view runtime is dominated by IO and actual document processing, not the loading of a handful of JavaScript functions.
  • To my surprise (and shock/horror), the version of Spidermonkey that is used by CouchDB (and Couchbase Single, and Cloudant) does not treat regular expression literals properly — they work fine as arguments, e.g. string.match(/foo/), but e.g. /foo/.exec("string") fails.  Using the RegExp() function with a string argument *does* work.  This was reported a long time ago, but has had little attention (though I’m trying to stir it up a bit).I’m hoping to get to the bottom of this sooner or later, but I wonder if it’d be worthwhile to change the ClojureScript reader to emit (js/RegExp "foo") calls instead of /foo/ literals (and hope that gClosure doesn’t optimize the former into the latter)?  After all, there’s lots of CouchDB deployments out there with apparently broken spidermonkey installs/configurations, and likely lots of other apps/servers/environments in similarly dire straits.

Finally, here are the slides from my talk at the BACUG (download/view PDF):

Provisioning, administration, and deployment of CouchDB, Java, Tomcat, etc., made easy with Pallet

Note: there may be relevant bits in here still, but usage of Pallet and jclouds has changed since this was first published originally.  See this post for links to up-to-date comprehensive example project, a screencast, and other goodies.

As I briefly mentioned in my last post, I’ve been working with Pallet to enable automated administration of, among other things, CouchDB. If you’re wondering why I’m using Pallet instead of, say, Puppet or Chef, you can either read the “Why Write Another Tool?” section in Hugo Duncan’s recent post on Pallet. My answer to that question is that I wanted a tool that would provide automated:

  • Provisioning,
  • Administration & configuration, and
  • Application deployment

…all in one piece of kit that would neatly interoperate with the rest of our development stack (JVM, Clojure, Maven, Hudson, etc., etc). Pallet is the only option I found that thread that needle.

From bare metal to ready-for-production app deployment in 5 minutes or 5 paragraphs…

Using Pallet, we can automate everything necessary to provision and configure the resources needed to run our application. The following code defines, spins up, and configures an EC2 node; the steps listed below correspond almost exactly with each line of the defnode configuration that forms the majority of the code:

  1. Use a specific Ubuntu AMI on a particular instance size
  2. Use a standard firewall / security group configuration
  3. Configure an “admin user” with a specific username that has only one authorized key (mine).
  4. Tweak apt so that it’s “sane”. <snark>I like being able to install useful software, so multiverse it is.</snark>
  5. Install the Sun JDK
  6. Install the Tomcat application server
  7. Install CouchDB and set two properties in its local.ini file (one to disable the javascript view server reduce limit – don’t ape that if you don’t know what you’re doing – and one to change its default storage location to a different directory).
  8. Create the aforementioned CouchDB storage directory.
  9. Deploy our application as the ROOT application in tomcat and restart it (I’ve omitted the part that sets security policy in the same block, which is what actually necessitates the app server restart).

(I’ve simplified certain things in this rendition, but what I’ve elided are details that are pretty esoteric and/or miscellaneous – i.e. installing unlimited-strength crypto policy files in the installed JDK, setting VM parameters for Tomcat, etc.)

(defn- sane-package-manager
  (pallet.resource.package/package-manager :universe)
  (pallet.resource.package/package-manager :multiverse)
  (pallet.resource.package/package-manager :update))

(pallet.core/defnode master
  [:ubuntu :X86_32 :size-id "m1.small"
   :image-id "ami-bb709dd2"
   :inbound-ports [22 80 443]]
  :bootstrap [(pallet.crate.admin/automated-admin-user +admin-username+)
  :configure [(pallet.crate.java/java :sun)
                [:query_server_config :reduce_limit] "false"
                [:couchdb :database_dir] +couchdb-root+)
              (pallet.resource.directory/directory +couchdb-root+
                :owner "couchdb:couchdb" :mode 600)]
  :deploy [(pallet.resource.service/with-restart "tomcat*"
             (pallet.crate.tomcat/deploy-local-file "/path/to/my/warfile.war" "ROOT"))])

(def service (jcompute/compute-service "ec2" "AWS_ID" "AWS_SECRET_KEY" :ssh :log4j)

(pallet.core/with-admin-user [+admin-username+]
  (jcompute/with-compute-service [service]
    (pallet.core/converge {master 1} :configure :deploy)))

(Note that jcompute is an alias for the compute namespace provided by the excellent jclouds library, which Pallet uses for cloud-agnostic infrastructure provisioning as well as cloud-specific stuff, like EBS volume and snapshot management, elastic IP management, etc.)

Want to spin up 10 nodes instead of one? Change {master 1} to {master 10}. Other changes are similarly straightforward. Want to deploy an application update to existing nodes instead of creating new nodes? Instead of using converge, execute (pallet.core/lift master :deploy).

There’s obviously a lot going on behind the scenes, but this is what the day-to-day configuration and usage of Pallet looks like. Using it means that I never have to use a command line or fiddly manual AWS tooling like their console or ElasticFox, or cobble together some combination of Chef/Puppet with Capistrano/Fabric and a pile of shell scripts to get a complete provision/configure/deploy solution.

Huge thanks to Hugo (who let me play in his sandbox) and Adrian Cole (the crazy man behind jclouds) for making this all possible.

Clearing some hurdles automating CouchDB administration

I ran into a couple of administration issues with CouchDB while working on support for it in the excellent Pallet project1, so I thought I’d leave some breadcrumbs for those that follow.

(Note that these issues were experienced with CouchDB 0.10.0 on Ubuntu Karmic. They may be resolved in later versions of CouchDB or Ubuntu, but those are the versions we’re targeting for now.)

Broken Directory Permissions

First, Karmic’s couchdb package is broken, insofar as key directories that CouchDB uses don’t have the right ownership or mode. The symptom of this is that CouchDB will not stop properly when one invokes /etc/init.d/couchdb stop. This is a known issue, and will hopefully be resolved for Ubuntu Lucid. Rumor has it that some versions of CentOS have the same issue.

The fix is simple:

chown -R couchdb:couchdb /var/log/couchdb /var/run/couchdb /var/lib/couchdb /etc/couchdb
chmod 0770 /var/log/couchdb /var/run/couchdb /var/lib/couchdb /etc/couchdb

That’s a bit of a carpet-bombing, but certainly won’t do any harm, and does the trick (adjust for the install dir you have, e.g. perhaps prefixing everything with /usr/local).

CouchDB only detaches when started from a full shell

This is where the world will learn that I’m mostly an idiot when it comes to shell stuff and sysadmin in general. Thanks go to Hugo Duncan for giving me a key hint that allowed to get past this one.

In short, pallet was doing the equivalent of this in order to invoke the scripts it generates for configuration management, etc. (assuming here that your user has NOPASSWD in /etc/sudoers:

ssh -t 'sudo /etc/init.d/couchdb start'

So, we’re allocating a tty, which many services need around in order to fork and detach properly (such as Tomcat via jsvc, for example). However, the CouchDB server that is started with this command dies along with the ssh session. Go ahead, give it a shot. If you really want proof, you can do this to see that the server is running before the session is closed out:

ssh -t 'sudo /etc/init.d/couchdb start;sleep 1; curl http://localhost:5984'

Of course, if you log into an environment with a full interactive session, starting CouchDB and then logging out will leave the server running as one would expect.

The solution is painfully simple in this case – just don’t invoke /etc/init.d/couchdb start as an ssh exec command. Whatever you’re using for configuration management, have it run in a full interactive shell session. That’s exactly what Pallet is now doing for all of its configuration executions.


The CouchDB crate in pallet is now pretty well bullet-proofed…or so I hope. :-)

1 Pallet is a tool/framework for compute node provisioning as well as configuration management and general sysadmin automation. I’m not aware of any similar provisioning automation frontends (except for jclouds, which Pallet wraps / uses), but I’d otherwise characterize Pallet as a mashup of chef + capistrano, but written in Clojure (yay!).