Results of the 2012 State of Clojure survey

A few weeks ago, I opened the 2012 State of Clojure surveyPer usual, I wanted to take the Clojure community’s collective temperature, see where we all came from, what we’re using Clojure for, what weaknesses and problems exist, and hopefully provide some results that would allow everyone to calibrate priorities and such.  This was the third such survey I’ve conducted, and it was a little different than prior efforts:

  • Now that there are a bunch of rapidly-maturing alternative Clojure implementations targeting non-JVM platforms, I wanted to see how those alternatives were faring in terms of uptake and usage.  So, the survey asked participants to characterize their usage of each of the Clojure implementations that I knew of that have matured to some minimal level.
  • A FAQ among language and library implementers has always been, “Which version of the JVM should I target?” I can’t say why I didn’t include a question relevant to that topic from the beginning, but it was included in this one — and the answers were surprising, at least to me.  A similar question was included that deals with which environment(s) people are targeting with ClojureScript.
  • As Clojure is used more widely and it becomes more and more clear that the language itself is stable, effective, and reliable, the next most pressing concern is the state of Clojure libraries.  Survey participants were asked to grade Clojure libraries, in general, on a number of different scores, from effectiveness to ease of discovery to documentation.

I hope you find the results interesting and useful.

Vitals

First, some facts about the survey and how it was conducted. It was available for approximately 7 days (Wednesday night through Monday afternoon), during which time results were not available.  The survey was announced primarily via Twitter and two messages to the main Clojure mailing list, which has approximately 6,700 subscribers. 1,372 responses were receiveda surprising 2x increase in participation over last year’s 672 responses. As I’ve said before, I’m not a statistician, nor am I a professional when it comes to surveying, polling, or data collection of this sort in general, but this would seem to be a very fair sampling of the Clojure community.

(I’m not sure what to make of the huge increase in participation.  I don’t think there’s any question that more people are using Clojure now than there were last year, but a 2x increase seems overoptimistic, especially given my bias.)

Now, on to the survey results.  See the links at the end to the raw data to satisfy all your statistical urges.

(Note: Any question with results that sum to more than 100% allowed respondents to select more than one response.)

Q: How long have you been using Clojure?

How long have you been using Clojure?

Good things to see: more new people trying Clojure, and those that were around last year and the year before sticking around.

New this year is the addition of the “I’ve stopped using Clojure” option.  Now that Clojure’s been around for long enough that some may have used it, and eventually stopped using it, I thought there might be lessons to be learned in what prompted such decisions.  I was surprised that only ten participants indicated that they’d stopped using Clojure; of course, people that have checked out are less likely to fill out a survey, but I was expecting more than ten nonetheless.  Only three of those left explanatory comments, so there’s hardly anything to generalize over.  But, there is one comment I’d like to highlight:

Clojure’s greatest strength is also it’s greatest barrier to acceptance into the mainstream corporate IT world – namely, it’s LISP syntax (which I personally like quite a bit).  Scala appears to be easier to pitch to corporate IT.  Thus, (for now) I’ve been working on pitching functional programming via Scala to our IT department.  The LISP syntax is to difficult a promote in a world dominated by Java and .Net.

Ah, the overwhelming power and ease of curly braces and monadic contravariant type classes? ;-)  (That phrase may be meaningless; my Scala vernacular is rusty these days.)  In the end, I don’t know that pitching Scala as a “better Java” because it can be made to look like Java is doing anyone any favors; as far as I can tell, idiomatic Scala is as different from Java as Clojure is from Java, in both syntax and semantics.

If you’re involved in corporate IT environments and have an interest in Clojure, you would do well to take in Neal Ford’s master plan for Clojure enterprise mindshare domination from last year’s Conj if you haven’t yet.  There’s no silver bullet therein, but I think it’s quite reasonable to expect “engaged” development organizations to be receptive to anything that will allow them to raise their game, regardless of parens vs. braces or similar silliness.

Q: What language did you use just prior to adopting Clojure — or, if Clojure is not your primary language now, what is that primary language?

Per usual, the majority of Clojure programmers “come from” Java, Ruby, and Python.  This is the third year of identical results, so this question will likely not return.

Q: If Clojure disappeared tomorrow, what language(s) might you use as a “replacement”?

Again, nearly identical to prior years’ results: relative to where people came from prior to using Clojure, functional programming (Haskell, Scala, Erlang) and lisps (Scheme, Common Lisp) are preferred.  Also mirroring prior years’ results, a number of people left “Other” responses like:

too depressing to contemplate

I agree.

Q: How would you characterize your use of Clojure today?

How would you characterize your use of Clojure *today*?

The proportion of people reporting at-work usage of Clojure is up again, 3 points higher than last year, and 11 points higher than in 2010. As important and promising as this is, I’m actually more heartened by the fact that the proportions of people that are using Clojure for hobby projects has remained constant: I think it would be a great loss if Clojure ever became one of those languages that is used at work, but shunned when one wants to do some hacking for fun.  That capacity for enabling play is a huge source of the creativity and liveliness that makes using Clojure not only sensible, but satisfying.

Q: In which domain(s) are you using Clojure?

In which domain(s) are you using Clojure?

This chart is identical to last year’s results.  Web development and math/data analysis remain ubiquitous and lots of people are contributing to open source Clojure projects.

Most of the “Other” domains are scientific in nature, interestingly enough:

  • scientific computing
  • biological research
  • bioinformatics
  • healthcare
  • medical (mass spec) data analysis

…and so on.  There’s also a solid chunk of people doing AI, NLP, and semantic web stuff.

Q: While Clojure started as a JVM-only language, there are now multiple implementations targeting different runtimes and environments. To what degree are you using each of these implementations?

The last year has seen an explosion in the number of runtimes targeted by various Clojure ports, reimplementations, and adaptations.  This question was answered via a grid of options, but a better visualization than a set of typical bar charts is not ready at hand; so, first the charts, and then a couple of words:

To what degree are you using Clojure?

To what degree are you using Clojure.CLR?

To what degree are you using ClojureScript?

To what degree are you using clojure-py?

To what degree are you using clojure-scheme?

To what degree are you using clojurec?

ClojureScript clearly has the most traction and attention of all of the alternative implementations — nearly 20% of respondents are using it in some non-investigatory capacity — but clojure-py, clojure-scheme, and clojurec are also each being evaluated by nontrivial numbers of programmers, despite their relative youth.  Targeting JavaScript, Python, Scheme, and C (and thus to all sorts of platforms, including iOS, embedded devices, and so on) have long been of interest to Clojure programmers, so this immediate attraction is not surprising.

This is in contrast with ClojureCLR, and demonstrates how critical the targeted platform is to the success of a language implementation: despite ClojureCLR being complete and well-supported by the tireless David Miller, 70% of respondents just don’t have any interest in it, and only 17 respondents are using it in any capacity.  My best guess at explaining this is that those targeting .NET are fundamentally unwilling to consider languages not officially supported by Microsoft…witness the stasis of efforts like IronRuby and IronPython compared to JRuby and Jython and the other dozens of stable, active languages on the JVM.

Aside from any technical matters, this dynamic alone is enough to justify Rich Hickey’s prehistoric decision to focus Clojure on the JVM rather than the .NET CLR, and should be a factor for anyone considering working on or adopting alternative Clojure implementations: the target platform cannot be a footnote or secondary consideration.

Finally: a couple of comments elsewhere either pined for a Clojure implementation that targeted Lua, or helpfully pointed out the existence of clojurescript-lua.  Clojure / Lua fans, check it out! (If a couple someones let me know it’s up to snuff, I’ll definitely include it next year.)

Q: If you are using Clojure on the JVM, which JRE/JDK version do you target?

Which JRE/JDK do you target?

Only 5 respondents indicated that they target Java 1.5.  I honestly expected that number to be much, much higher.  I concede that there are surely plenty of Clojure programmers who aren’t floating around on the mailing list and twitter and who aren’t likely to participate in surveys — and such programmers might be more likely to be targeting older JVMs.  But, all caveats aside, this is fantastic news.

This means that, in general, library authors can safely target Java 1.6.  And, if it yields any benefits, Clojure moving to require Java 1.6 would appear to have few downsides (apologies to the few still targeting 1.5).

Q: If you are using ClojureScript, which environments do you target?

Which JRE/JDK do you target?

What surprised me here was that there’s a bunch of people using ClojureScript to write node.js apps!  That surely exposes my personal biases.

Notable “Other” selections here include Android and iOS (presumably through embedded web views?), as well as usage of various mobile-deployment toolkits like PhoneGap, Cordova, and so on.

In hindsight, I probably should have added “Mobile devices” and “Databases” (e.g. CouchDB, PostgreSQL) to the available options.

Q: How true are the following statements when applied to the Clojure libraries available within your domain(s)?

The JVM and other targeted platforms are known, stable quantities; and, especially with the release of Clojure 1.3 and 1.4 in the last year and the ongoing maturation of ClojureScript, the language is stable as well.  This leaves libraries as a certain unending “last frontier” of usability, in terms of programmers being able to get things done using Clojure and Co.  So, the survey asked if Clojure libraries, in general, are aptly described by a series of statements.

First, the charts, and their corresponding statements:

“They implement core functionality well.”

“They are more effective than analogous libraries in other languages.”

“They are easy to find.”

“Their maintainers are receptive to feedback, patches, etc.”

“They are accurately and adequately documented.”

Sweeping conclusions made from this data are probably suspect, but I’ll make a stab at it:

Clojure libraries are generally of high quality, and more effective than libraries people have used in other languages.  Further, library maintainers are generally collaborative and easy to work with.

On the other hand, the right Clojure library isn’t always easy to find, and many libraries are poorly or inaccurately documented.

There have been a couple of stabs at easing discovery, but not yet anything that’s caught fire (or, been regularly maintained, unfortunately).  At the very least, everyone should keep search.maven.org bookmarked for those libraries that are deployed to Maven central.  But, in the end, it may be that there’s no solution to this problem, insofar as it’s damn hard to track distributed development efforts with a centrally-planned solution; just look at how essentially rudimentary search.maven.org is, and that’s servicing one of the largest software development ecosystems in the world.  It may be that time and internalized community knowledge (e.g. “What is the best library for X?” “Y, of course!”) are the only things we’ll be able to count on.

(Those shouting “CPAN, CPAN!” at their screens now should really calm down.)

The documentation issue is, as we’ve seen in previous surveys and as we’ll see again later here, the most common thread of complaint amongst Clojure programmers.  More on this in a bit.

In hindsight, I should have offered only two options (Yes/No), which would have pushed people off the fence of “Some are”, and thus made it easier to get a clear picture of the state of things.

Q: What have been the biggest wins for you in using Clojure?

Compared to last year, mentions of protocols, records, and types went up by 50%, probably due to people starting to grok how to best utilize them.  The proportion that called out multimethods doubled, which I’m even happier to see.  Beyond that, functional programming, JVM interop, and the joys of the REPL continue to be people’s favorite benefits of Clojure.

Notably, very, very few people reported that the extensible reader introduced in Clojure 1.4.0 was a big win.  This is understandable given how new 1.4.0 is (released just four months ago), and so it may take some time for people to learn how and when to use tagged literals well.

Q: Which development environment(s) do you use to work with Clojure?

There’s been some movement in the area of development environments:

  • Emacs usage dropped 10%
  • Eclipse + Counterclockwise gained 4%
  • Usage of “Command-line REPL”s gained 5%

…but, I think most of the shifts from last year are noise.  What isn’t noise is the total collapse of usage of NetBeans + Enclojure; it was used by 13% of respondents in 2010, and now is used by just 1%.

Popular mentions in the “Other” column include:

I’ll talk a bit more about development environments later.

Q: Which toolchain(s) do you use?

Everyone uses Leiningen.  The other 5% just wish they could.  ;-)

There’s no data from previous years to compare this to, but whatever the curve of growth of Leiningen, technomancy & co. deserve a ton of credit for shaving the yaks and herding the cats to build it up into what it has become today.  People’s everyday Clojure experience is greatly enhanced by the get-out-of-the-way approach taken by Leiningen, and I think we’re all better off for it.

Q: Describe how you have participated in the development of the Clojure implementation or “contrib” libraries.

This is a bit of inside baseball, but I was curious to see how widespread participation is in the development of Clojure itself and the surrounding “contrib” libraries, and what might be standing in the way of people that might otherwise participate.

A couple of things jump out at me:

66% of respondents haven’t had a need to contribute to Clojure, to the point of not even needing to report bugs.  I think that’s a damn good number of people that have presumably had a distinctly positive experience re: language quality and completeness.

There’s rarely a week (or, day, perhaps?) that goes by on Twitter or in irc without some griping about Clojure’s contributions process and policies. However, just 85 respondents (~6%) cite that process or the Clojure contributor agreement (the most contentious part of those policies) as barriers to their contributing to the language and surrounding libraries; this makes me think that these issues are the favorites of a loud minority more than anything else.  On the other hand, 85 respondents represent 28% of the group that have the desire and need to contribute (i.e. excluding those that “haven’t needed” to contribute and those that are “unsure of their ability” to contribute).  It does seem worrisome that more than a quarter of potential contributors are opting out at least some of the time for reasons orthogonal to the code and documentation and work involved.

Q: What has been most frustrating for you in your use of Clojure; or, what has kept you from using Clojure more than you do now?

A lot of the most common and most frustrating problems were expanded upon in the free-form responses, so I talk about them a bit there (see below). Otherwise, just a couple thoughts:

The Clojure community remains a nice place to be, with only 2% of respondents indicating that they’ve had unpleasant interactions with others, the same proportion as last year.

I didn’t think it possible, but an even larger proportion of respondents than last year — now up to a third — indicate that documentation is a key problem.  This ties into the feedback seen earlier that library documentation is generally not what it should be.  To a large degree, this is a self-inflicted wound: both Clojure and its libraries are technologically sufficient and effective and useful, but many, many people are tripping on their way towards using them.

Given that the only other problems within 10 percentage points of the documentation issue are largely out of everyone’s direct control (issues using Clojure for scripting, likely hampered by the JVM’s startup time; and future staffing concerns, more of a communications and psychological issue than a technical or content problem), the best thing anyone can do to help Clojure succeed is to help make documentation and tutorials better for every skill level and domain, anywhere. I do what I can in my projects and elsewhere; please do what you can, too.

Free-form topics

Per usual, I included two free-form fields where people could write whatever they like: one dedicated to the biggest problems people think Clojure has (which allowed people to point out topics I didn’t offer as options in the prior question, or to further emphasize their choices there), the other for “general comments”.  Responses from their fields are included in the raw data linked below, as well as in separate HTML files for easier reading (spreadsheets are really poor for reading long-form text).  I encourage you to take a look, as I only provide a selective, slanted, biased, self-interested, unscientific overview here…

“What do you think is Clojure’s most glaring weakness / blind spot / problem?”

Predictably, the topics named in the responses to this question ranged all over the place.  Here’s a selection of common complaints, or tidbits I found interesting.

Documentation & getting started

I already flogged this above, but it’s worth noting the level of “enthusiasm” (a.k.a. “vitriol”) many people have with regard to the documentation for and around Clojure.  This is easily the most common topic in the “weak spot” comments.  There are few areas that are immune from such criticism, but much of it inevitably lands on clojure.org, its (lack of) connection to the development wiki (flawed as it is), and so on.

Specifically, there are a number of comments along the lines of:

http://clojure.org/getting_started should be replaced with https://github.com/technomancy/leiningen/blob/master/README.md

There’s been a lot of griping over the years about the “getting started” section of clojure.org, insofar as it leads newcomers through bootstrapping Clojure through a direct java -cp ... invocation on the command line, leaving them with the default, not-particularly-nice (sorry) REPL.  I’ve previously thought that the neutral stance taken on clojure.org is reasonable, insofar as a language shouldn’t be dictating (explicitly or otherwise) the use of a particular toolchain.  However, given the clearly universal acceptance of Leiningen into people’s hearts, and relatively high degree of difficulty that newcomers clearly feel with regard to the official first steps, I think it may be time to revisit that neutrality.  Leiningen, especially its v2.x incarnation, makes Clojure look and feel good “right out of the box”; since so many of the difficulties people report are around the immediate getting-started process, this simple recommendation could alleviate a lot of pain people are having.

Development environments

Many remain unhappy with the available development environments, and many express frustration with Emacs and the learning curve it entails (this not necessarily being an indictment of Emacs; if another editor were the most popular, it might receive the majority of scorn as well). However, I’ll go out on a limb and say that the number and strength of complaints on this topic were down somewhat from last year, which would correspond well with the 8% drop in reports of editors and IDEs being a leading frustration in the previous question compared to 2011.

Along these lines, a number of respondents wrote something similar to:

…every time counter clockwise gets another release, the world gets better.

I wholeheartedly agree.  Laurent has made a ton of progress with Counterclockwise in the past year, with Leiningen 2 integration that’s never failed me.  By all means check out the latest betas for absolutely excellent, dead easy-to-use Clojure code completion.  If you’re not happy with what you use today, give Counterclockwise a shot. (Disclaimer: I am — though not lately :-( — a Counterclockwise contributor.)

And, thanks to Relevance as well, for their ongoing sponsorship of Laurent’s work.

Hubris

I found this complaint interesting:

The hubris it’s champions have about functional programming.

I might be a good prototype of this commenter’s claim: I champion Clojure, am convinced that functional programming is superior to nearly all alternatives for most applications that I care about and that I see friends and peers building, and I share that conviction broadly.  Perhaps “hubristic Clojure goon” is the new “smug lisp weenie”?

The notion of “hubris” — excessive pride and arrogance — is interesting though. Should we scrape along with caveats and disclaimers, giving equal time to procedural programming and object oriented programming and the power of pointer arithmetic in order to prove our humility? One of the great things about the Clojure community in general is that it expects people to be adults capable of independent and self-motivated thought.  So, if someone — anyone — says something confidently, you should be happy and proud to disagree as long as you have good reason to do so.  There is nothing more cherished among Clojure programmers than well-reasoned discussion, and admitting error, mistake, or fault isn’t a taboo sign of weakness.

To me, this sounds like vitality and honesty, not hubris.

Development process snark

Whatever its merits or demerits, people have grown accustomed to laissez-faire collaboration styles (exemplified currently by the typical Github project), so comments like this:

contrib doesn’t seem very contrib-y.

and

Clojure/core’s apparent unfriendliness to community contributions.

aren’t uncommon, and underline the sentiment some expressed in the contribution-related question earlier.  Fair or not, there is definitely an undercurrent of frustration among some.

Android pain

An example comment:

One would expect clojure to have a better Android story.

Many people lamented Clojure’s apparent difficulty in being used to program Android apps.  I confess I’ve never tried myself, but I know that some people have gone to great lengths to build and publish Android apps written in Clojure, apparently needing to produce custom builds of the language in order to work around particularities in the Dalvik VM (stack size, JIT limitations, and classloader idiosyncrasies being some of the problems I’m vaguely aware of).  Making Clojure work well on Android is an effort being tracked, but one that I don’t really know the status of.

It’s surprising how many Android-related complaints showed up, even though only 3% of respondents reported using Clojure for mobile development earlier.  Perhaps that figure is so low because Clojure’s Android support is apparently just not up to snuff (and thus driving people to build apps using ClojureScript + e.g. Phonegap, or punting entirely and using Java or Scala or …)?

“General comments?”

Thankfully, the “general comments” field immediately followed the dedicated bitch-box ;-), so nearly all of the responses here were filled with whimsy and joy about how great Clojure is, e.g.:

Thanks to RH.

It’s a lovely language.

I love it. Programming became fun again like it was 15 years ago.

Missing question: “Do you like the direction Clojure’s going in?”  – Yes! :)

Clojurescript is the shit

<3

Clojure put the fun back in software development for me at a point where I was disillusioned and close to a career change (even though I started programming when I was 10yrs old). Now I’m programming Clojure 12hrs a day for the tech startup I founded and I absolutely love it.

…and, my personal favourite:

I love Clojure so much, I want to give it a kitten.

Me too! So, Clojure, have a kitten; her name is Mittens:

Mittens!

She’s seen enough Clojure at this point that she should be able to get a job writing the stuff by now…

Raw Data

All of the data gathered by the survey form are available:

If you are particularly interested in the Clojure community, are involved in Clojure projects, or write about, teach, or promote Clojure, you would be well-served to browse around the survey data to draw your own conclusions.


I’ll wrap this up with one final thought that came while combing through the survey data and compiling this post: all while Clojure and its community has stayed healthy and continued growing like a weed, and while ambitious hackers have helped the language bloom into a family of languages targeting different platforms, no fatal risk has yet shown itself.  Five years on, there’s been no crucial flaw found in the principled underpinnings of the language, and no poisonous dynamic has twisted its way into the community.  While problems do exist, they are tractable, and many will yield through time and goodwill, a commodity for which we thankfully do not want.

For anyone considering placing a bet on Clojure, and for those of us that already have, these are all very, very good signs.

Thus ends the third State of Clojure survey.  I hope you’ve found the above interesting, thought-provoking, and perhaps useful.

Posted in Clojure | 15 Comments

PDFTextStream now available free (as in beer)

PDFTextStream v2.6.0 was released today with a variety of small new features and a couple of bugfixes.  The bigger change is that PDFTextStream is now available free for use in single-threaded applications.

Because of the realities of the economics around developing and maintaining a product like PDFTextStream, its pricing has often been out of reach of many projects and very small organizations that really need high-quality PDF content extraction functionality.  That’s not to say that PDFTextStream is overpriced — it’s actually less expensive than other options — but that is small comfort to many that simply cannot afford or cannot justify the expenditure yet.

This change should fix that: if you have a smaller project, are working on a startup, are involved in information research, etc., you can now benefit from all that PDFTextStream has to offer.  And, if and when your architecture requires concurrent PDF processing, or your PDF content extraction workload is large enough to need to worry about properly utilizing your hardware and compute resources, you can easily upgrade to the unlimited, licensed “edition” of PDFTextStream to parallelize that workload.

It will be fun to see what people build now that PDFTextStream is gratisTry it out!

Posted in Announcements, PDFTextStream | Leave a comment

2012 State of Clojure survey

I’ve run “State of Clojure” surveys for each of the last two years (see results from 2010 and 2011), and the time has come for the 2012 edition.

The survey itself is embedded below.  It will remain open for input for approximately a week, until ~Thursday, July 26th.

A lot has happened over the past year: the first two non-Conj Clojure conferences (Clojure/West and EuroClojure) went off to great popular acclaim, Clojure 1.4.0 was released, more Clojure books have been published, and more and more people continue to be drawn into the language’s orbit.  The apparent vector of progress and activity seems to be approximately the same as it was when I wrote up the first State of Clojure survey:

The Clojure community is larger than it ever has been, and shows no sign of slackening its growth.  It seems like now would be a good time to take stock of where the community is, how people came to use Clojure, and how it’s being used in the world.

Hopefully enough responses will come through that we’ll be able to get a good picture of the current state of affairs, and maybe a little insight into where Clojure can and should make headway in the future.

As before, I will post again sometime shortly after the survey closes with all of the captured data, some pretty charts, and whatever attempts at witty, (un?)biased commentary I can come up with.  ;-) It would be great to see some follow-on analyses using the raw data: e.g., people doing game development are really unhappy with the state of libraries in their chosen domain; or, maybe people doing mobile development are starting to seriously look at some particular Clojure implementation. Who knows what interesting tidbits might rise to the surface if someone really dug into the data…

Finally, please do what you can to spread around this survey to those that you know of that are working with Clojure — really, in any capacity.  You’ll find various social media chicklets at the bottom of this post if you want the lazy way out.

The survey is now closed.  I’ll be posting the results and my pithy analysis sometime next week…

Posted in Clojure | 4 Comments

On the stewardship of mature software

I just flipped the switch on v2.5.0 of PDFTextStream.  It’s a fairly significant release, representing hundreds of distinct improvements and bugfixes, most in response to feedback and experiences reported by Snowtide customers.  If you find yourself needing to get data out of some PDF documents, you might want to give it a look…especially if existing open source libraries are falling down on certain documents or aren’t cutting it performance-wise.

But, this piece isn’t about PDFTextStream, not really.  After prepping the release last night, I realized that PDFTextStream is ten years old, by at least one reckoning: though the first public release was in early 2004, I started the project two years prior, in early 2002, ten years ago. Ten years.

It’s interesting to contemplate that I’m chiefly responsible for something that is ten years old, that is relied upon by lots of organizations internally, and by lots of companies as part of their own products.  Aside from the odd personal retrospectives that can be had by someone in my situation (e.g. friends of mine have children that are around the same age as PDFTextStream; am I better or worse off having “had” the latter when I did instead of a son or daughter?), some thought has to be given to what the longevity and particular role of PDFTextStream (or, really, any other piece of long-lived software) implies and requires.

I don’t know if there are any formal models for determining the maturity of a piece of software, but it seems that PDFTextStream should qualify by at least some measures, in addition to its vintage.  So, for your consideration, some observations and opinions from someone that has their hand in a piece of mature software:

Mature software transcends platforms and runtimes

PDFTextStream is in production on three different classes of runtimes: all flavours of the JVM, both Microsoft and Mono varieties of the .NET CLR, and the CPython implementation of Python.  This all flows from a single codebase, which reminds me many kinds mature systems (sometimes referred to as “legacy” once they’re purely in maintenance mode — a stage of life that PDFTextStream certainly hasn’t entered yet) that, once constructed, are often lifted out of their original runtime/platform/architecture to sit on top of whatever happens to be the flavour of the month, without touching the source tree.

Often, the effort required to make this happen simply isn’t worth it; the less mature a piece of software is, the easier it is at any point to port it by brute force, e.g. rewriting something in C# or Haskell that was originally written in Java.  This is how lots of libraries made the crossing from the JVM to .NET (NAnt and NHibernate are two examples off the top of my head).

However, the more mature a codebase, and the more challenging the domain, the more unthinkable such a plan becomes. For example, the prospect of rewriting PDFTextStream in C# to target .NET — or, if I had my druthers, rewriting PDFTextStream in Clojure to satisfy my geek id — is absolutely terrifying.  All those years of fixes and tweaks in the PDFTextStream sources…trying to port all of them to a new implementation would constitute both technical and business suicide.

In PDFTextStream’s case, going from its Java sources to a .NET assembly is fairly straightforward given the excellent IKVM cross-compiler.  However, there’s no easy Java->Python transpiler to reach for, and a bytecode cross-compiler wasn’t available either.  The best solution was to invest in making it possible to efficiently load and use a JVM from within CPython (via JNI).  With that, PDFTextStream, derived from Java sources, ran without a hitch in production CPython environments. Maybe it was a hack, but it was, in relative terms, easier and safer than any alternative, and had no downsides in terms of performance or capabilities.

(I eventually nixed the CPython option a few years ago due to a lack of broad commercial interest.)

Thou shalt not break mature APIs

When I first started programming in Java, I sat aghast in the ominous glow of java.util.Date. It was a horror then, and remains so. The whole thing has been marked as deprecated since 1997; and, despite the availability of all sorts of better options, it has not been removed from the standard library.  Similar examples abound throughout the JRE, and all sorts of decidedly mature libraries.

For some time, I attributed this to sloth, or pointy-haired corporate policies, or accommodation of such characteristics amongst the broad userbase, or…god, I dunno, what are those guys thinking? In the abstract, if the physician’s creed is to “do no harm”, it seems that the engineer’s should be “fix what’s broken”; so, continual improvement should be the law of the land, API compatibility be damned.

Of course, it was naïve for me to think so.  Brokenness is often in the eye of the beholder, and formal correctness is a rare thing outside of mathematics.  Thus, the urge one has to “make things better” must be tempered by an understanding of the knock-on effects for whoever is living downstream of you.  In particular, while making “fixes” to APIs that manifest breaking changes — either in terms of signatures or semantics — might make you feel better, there are repercussions:

  • You’ll absolutely piss off all of your customers and users.  They had working code that now doesn’t work. Whether you are charging them money or benefiting from their trust, you are now asking them to take time out of their day to help you feel better about yourself.
  • Since their code is broken already, your customers and users might see this as the perfect opportunity to make their own changes to not have to cope with your self-interested “fixes” anymore.  Surely you can imagine the scene:

    Sarah: “Hey Gene, the new version of FooLib changes the semantics of the Bar(string) function. Do you want me to fix it now?”

    Gene: “Sheesh, again? Well, weren’t you looking at BazLib before?”

    Sarah: “Yeah; BazLib isn’t quite as slick, but Pete over in Accounts said he’s not had any troubles with it.”

    Gene: “I’m sold. Stick with the current version of FooLib for now, but next time you’re in that area of the code, swap it out for BazLib instead.”

This is why semantic versioning is so important: when used and understood properly, it allows you to communicate a great deal of information in a single token.  It’s also why I can often be found urging people to make good breaking changes in v0.0.X releases of libraries, and why PDFTextStream hasn’t had a breaking change in 6 years.

Of course there are parts of PDFTextStream’s API that I’m not super proud of; I’ve learned a ton over the course of its ten year existence, and there are a lot of things I’d do differently if I knew then what I know now.  However, overall, it works, and it works very well, and it would be selfish (not to mention a bad business decision) to start whacking away at changes that make the API aesthetically more pleasant, or of marginally higher quality, but which make customers miss a beat.

It seems to me that a good guideline might be that any breaking change needs to be accompanied by a corresponding 10x improvement in capability in order to be justifiable.  This ties up well with the notion that a product new to the market must be 10x better than its competition in order to win; insofar as a new version of the same product with API breakage can potentially be considered as foreign as competing products, that new version is a new product.

Managing risk is Job #1

If your hand is on the tiller of some mature software — or, some software that you would like to see live long enough to qualify as mature — your first priority at all times is to manage, a.k.a. minimize, risk for your users and customers.

As Prof. Christensen might say, software is hired to do a job.  Now, “managing risk” isn’t generally the job your software is hired to do, e.g. PDFTextStream’s job is to efficiently extract content from any PDF document that is thrown at it, and do so faster and more accurately than the other alternatives.  But, implicit in being hired for a job is not only that the task at hand will be completed appropriately, but that the thing being hired to do that job doesn’t itself introduce risk.

The scope of software as risk management is huge, and goes way beyond technical considerations:

  • API risk, as discussed above in the “breakage” section
  • Platform risk. Aside from doubling the potential market for PDFTextStream, offering it on .NET in addition to the JVM serves a purpose in mitigating platform risk for our customers on the JVM: they know that, if they end up having to migrate to .NET, they won’t have to go find, license, and learn a new PDF content extraction library.  In fact, because PDFTextStream licenses are sold in a platform-agnostic way, such a migration won’t cost a customer of ours a penny.  Of course, the same risk mitigation applies to our .NET customers, too.
  • Purchasing risk. Buying commercial software outside of the consumer realm can be a minefield: tricky licensing, shady sales tactics, pricing jumping all over the map (generally up), and so on.  PDFTextStream has had one price increase in eight years, and its licensing and support model hasn’t changed in six.  Our pricing is always public, as is our discount schedule.  When one of our customers needs to expand their installation, they know what they’re getting, how much it’s going to cost, and how much it’ll cost next time, too.

Even if one is selling a component library (which PDFTextStream essentially is), managing risk effectively for customers and users can be a key way to offer a sort of a whole product.  Indeed, for many customers, managing risk is something that you must do, or you will simply never be hired for that job, no matter how well you fulfill the explicit requirements.

Posted in Business, Craftsmanship, PDFTextStream | Leave a comment

What sucks about Clojure…and why you’ll love it anyway

I gave a talk with this title at the wonderful Clojure/West conference back in March.  The video and slides for it are now available on InfoQ here.

You can go watch the talk itself, so I won’t repeat its preface.  I’ll just say that, having seen the talk for the first time myself, I’m very happy with its content and delivery.  I’m hoping that others — especially those new to Clojure or looking at Clojure for the first time — will find the talk helpful in directing their attention to the tricksier bits of the language.

Posted in Clojure | 1 Comment

Starting Clojure (mk. 2)

I’ve wanted to put together a long-form introductory Clojure screencast for some time.  I had an opportunity to do this in grand style yesterday in a live O’Reilly webcast, but, for various reasons, I wasn’t fond of how that came together.  So, I cut another live coding screencast that introduces, in various levels of detail:

A complete (though not pretty!) URL shortener webapp is built from scratch, with discussions of immutable data structures, function composition, and mild concurrency topics scattered throughout.  It’s a fair bit more than I was planning on covering in the O’Reilly webcast, but I think the additional material blended in well.

Without further ado, Starting Clojure, mk. 2:

(You may want to watch full screen in HD to see various details)

Once you’re done with the screencast, you may want to continue your Clojure explorations with the help of Clojure Programming, and maybe Clojure Atlas (which, conveniently enough, is available at a hefty discount with your copy of  Clojure Programming).

Backstory on the O’Reilly Starting Clojure webcast

Ever had a bad day? Sure, of course. Ever had a really bad day prior to presenting a live-coding webcast to what turned out to be ~700 internet attendees?  Yeah, that was me yesterday. If you’re brave (or want to wince, laugh, and then cry at my performance), you will likely be able to see the video of it eventually. But seriously, don’t bother.

I probably should have postponed the whole thing, but that seemed unreasonable at the time — the fact that it had been planned for a couple of weeks, had a bunch of registered attendees, and my own stubbornness urged me on to commit programmer seppuku.  I was existentially distracted the whole time, and the more I tried to hold it together, the worse things got.  (Of course, that’s not an excuse, but an explanation.)  Honestly, after finishing the webcast, I was absolutely horrified; I had a great opportunity to represent Clojure well to a large body of programmers new to the language, and I utterly failed.  I felt like I had done a disservice to O’Reilly and, most of all, my coauthors.

Fight or flight kicked in, and for 5 minutes, I harbored thoughts of giving up doing screencasts and public speaking permanently, to save everyone involved. Thankfully, I relaxed, had a couple glasses of wine, and woke up early this next morning with a clear head to record a live-coding screencast, in proper single-take style, which you see above.  It is epically better than the O’Reilly webcast, covers the material better than I planned, and was marred only by a couple of minor hiccups that were more funny than sad.

That is to say, mk. 2 is entirely in keeping with my usual baseline, and I’m happy to have it out there.  In the end, I hope more people see it than the first webcast I did.  In any case, I’m glad to have gotten back on the horse and hopefully redeemed myself by some measure.

Posted in Clojure, Clojure Programming (book) | 28 Comments

A refresh of Clojure Atlas

I’m sorry to admit that I let the Clojure Atlas wilt a bit over the past year or so. (I was a little busy!)  However, I am conversely quite happy to say that that’s over now; Clojure Atlas has been refreshed to add editions for Clojure v1.3.0 and v1.4.0.

(If you don’t know what Clojure Atlas is, head on over and check out the snazzy new demo/tour video.)

Other highlights include:

Pricing changes

I think the previous pricing was too high.  (You never know until you try.)  Pricing has been lowered, and I’ve added a fun option whereby you can get any edition of Clojure Atlas for just $5.  I don’t quite know what I’ll end up doing for upgrades going forward, but you will definitely be able to stay current without paying the full boat each time.

Free upgrades

Between the too-high pricing and the far-too-long period between the initial release of Clojure Atlas and now, those that prepaid for access to the Clojure v1.3.0 Atlas how have access to all of them, up to and including v1.4.0.  Those early significant supporters will also get free upgrades to all future Clojure Atlas revisions.  Thanks, guys and gals.

If you only purchased the Atlas for Clojure v1.2.0 previously, your account has been upgraded to include the Atlas for v1.3.0.

Ontology improvements

Aside from the obvious additions that needed to go in to reflect changes in Clojure v1.3.0 and v1.4.0, the ontology has been improved significantly to be more comprehensive and more accurate.  In addition, I’ve started adding detailed documentation (for example) to subjects/nodes within the ontology that I’ve added (in contrast to vars, which in general already have documentation of their own).

Visualization improvements

The graph visualization is certainly far from perfect, but I’ve tweaked it a fair bit to get it to “settle” faster than it did before.  I’m also pondering a complete reworking of the visualization to make it deterministic (rather than using a particle simulation as it does now).

No more PayPal

Many people balked at using PayPal — and believe me, no one is happier than I to be rid of it at this point.  Payments are now all handled courtesy of Stripe, which has been a dream to work with.

Posted in Announcements, Clojure, Clojure Atlas | Leave a comment

Friend: an extensible authentication and authorization library for Clojure Ring webapps and services

Say hello to my little Friend.

There’s plenty of technical stuff in the README to chew on if you like.  In short, I’m hoping this can eventually be a warden/spring-security/everyauth /omniauth for Clojure; that is, a common abstraction for authentication and authorization mechanisms.  Clojure has been around long enough that adding pedestrian things like form and HTTP Basic and $AUTH_METHOD_HERE to a Ring application should be easy.  Right now, it’s not: either you’re pasting together a bunch of different libraries that don’t necessarily compose well together, or you get drawn into shaving the authentication and authorization yaks for the fifth time in your life so you can sleep well at night.

Hopefully Friend will make this a solved problem, or at least push things in that direction.  It plays nice with all of the best principles of Ring, and includes support for:

  • form, HTTP Basic, and OpenID authentication
  • role-based authorization (optionally using hierarchical roles via Clojure’s derive and isa?)
  • su capabilities (multiple login support / a.k.a. “log in as”)
  • channel security (i.e. HTTPS-only for certain Ring routes)
  • …and more

Most importantly, it takes a stab at a couple of core abstractions for others to drop in other authentication workflows, e.g. OAuth in all of its incarnations, NTLM, BrowserID, etc. etc. etc.  There are already plenty of Clojure implementations for all sorts of authentication methods; hopefully someone (you?!) will step up and bring one of them to the party, so anyone’s Friend-empowered Clojure webapp can easily offer any or all of them with a minimum of suffering.

Finally: frankly, it’s absurd that I’m writing security-related stuffs.  (I know it hardly ever works out that way, but it seems like some experts somewhere should be taking care of this.)  It would be a great thing if you were to beat on Friend and try to find exploits, general breakage, etc., especially if you have prior experience in this area.

Posted in Announcements, Clojure, Open Source | 4 Comments

Originally posted on Mostly λazy…a Clojure podcast:

Recorded November 12th, 2011, the fourth and final recording in a series of conversations from Clojure Conj 2011.

Chris Houser (usually known as chouser online) has been working with Clojure longer than nearly anyone else; he started tinkering with the language in early 2008, and was a fixture in #clojure irc and on the mailing list for years.  His contributions to the language, early libraries, and community through his always genial and insightful presence are hard to overstate.  More recently, he has coauthored the excellent Joy of Clojure along with Michael Fogus, and is now working with Clojure daily over at Lonocloud.

It’s been my privilege to know and work with Chris a bit over the years, and, as always, it was great to talk with him in person.

Enjoy!

Listen:

Or, download the mp3 directly.

Discrete Topics

  • “Everything I learned, I [learned] on irc?!”
  • Macros…

View original 111 more words

Posted in Uncategorized | Leave a comment

‘Clojure Programming’ book finished

Yes — it’s finished! :-D

Early last month, after writing 190,000 words, editing away scads more, assembling and testing more than 1,000 code snippets and 20 full sample projects, and conceptualizing dozens of illustrations, Christophe, Brian, and I declared Clojure Programming done.  It’s been writhing its way through O’Reilly’s editorial process ever since.

I’d hoped that the book would be published before Clojure/West in mid-March, but alas, it was not to be.  It looks like it’ll drop in mid-April.

However, fret not! If you want to dig into Clojure Programming right away, you can read the final draft of it online.  Of course, you can preorder the dead-tree version of it as well; easy links to both options are available at clojurebook.com.  There, you’ll also find a full table of contents, some basic info on the book, and a way to join the clojurebook.com mailing list and a pointer to the book’s Twitter account.  We’ll be pushing various Clojure tips and links to useful tools and resources and announcing the availability of all sorts of book-related content on the site through the mailing list and Twitter feed; and, if things work out as I hope, some early access to and/or special offers for things that will help you get the most out of your Clojure experience in general.

So, thanks for your patience.  I think the book will end up being worth it.  Of course, I have to thank my coauthors; without Brian and Christophe, it simply would never have been finished, nor would it be as good as it is.  There’s a ton of other people that deserve credit too, but you’ll have to buy the book and read the acknowledgements to learn about them…

Posted in Books, Clojure, Clojure Programming (book) | 5 Comments