In the past month, I’ve read no fewer than 8 articles and blog posts trying to thread a story around what is apparently the “big” question these days: how can software companies make money in an open source world? Well, we are, quite well thank-you-very-much. Here’s how and why.
Our primary product is PDFTextStream. It came on to the market a year ago, entering a market (Java libraries that can extract content from PDF documents) that was dominated by open source (or dual-licensed) offerings that are generally well-liked by the broader community.
OK, so why are we still here, thriving and growing?
- Positioning When I decided to enter this market three years ago, I knew we would have a good chance simply because it has characteristics that are uniquely suited to a strong, specialized commercial vendor. While generating PDF documents is generally quite easy (thereby leading to a glut of report-generating libraries), extracting content from PDF documents is not. There are numerous file-format ambiguities to address, as well as the details related to achieving document understanding accuracy that is demanded by corporate and government customers. Anyone not dedicated to serving this market with 100% of their effort will not meet the market’s true demands.
- Execution Anyone who strives to innovate eventually experiences some anxiety about sharing ideas with colleagues, with the irrational fear that those ideas might be misappropriated, leading to unnecessary competition. The thing is, dozens or hundreds of other people in the same field are likely having the same ideas simultaneously, so the only thing that will ever ensure business success is superior execution.
Likewise, there are at least four open source Java libraries that extract content out of PDF documents. It’s not arrogant or smug to say that we’ll out-execute the teams or individuals that work on those libraries. We’re in this for the long haul and this is all we do 14 hours a day.
- Serving a Niche Very closely related to product positioning was the decision to enter a very demanding niche. We’re not trying to build yet another HTTP server, EJB container, etc. We’re not working on a commodity, and therefore we are much less likely to see competition from an open source library staffed by developers from IBM (for example). Beyond this market-centric reality is the fact that PDF content extraction is a much more difficult game than writing an HTTP server (again, for example) — there are no standards, there are no RFC’s, there’s no easy way to tell if you’re doing things the right way. So, if someone wants to go head to head with PDFTextStream, they’ll have to grab their machete and start slicing through the same jungle of PDF specs, mangled documents (which nevertheless open in Acrobat without a hitch), and all of the other fun that goes into building a PDF extraction library.
I’m not saying that this formula we’ve worked out is simple, or that it can be easily replicated with a different product in a different market. However, at least from where I’m sitting, “living in an open source world” is pretty pleasant.