Monday, April 30, 2007

MarkLogic Named Best-of-Show Award Finalist at Bio-IT World

Just a quick post to announce that our product, MarkLogic Server, has been named a finalist in the Bio-IT World Conference's best-of-show awards, in the life sciences software and informatics category.

The conference's press release announcing the all finalists is here. I should note that we are joined by sister Sequoia portfolio company, Isilon Systems, on the overall finalists list.

The other finalists include in the life sciences software and informatics category are Ingenuity Systems and Linguamatics.

Thursday, April 26, 2007

Valleywag's Hype Cycle

I've always been a fan of Gartner's hype cycle, and have always loved Gartner's names for the milestones along the way (e.g., the peak of inflated expectations, the trough of disillusionment).

So it was fun to discover that Valleywag, the People Magazine of Silicon Valley, had adapted the idea and made their own version for Web 2.0 companies.

The full Valleywag post is here.

Scribd: YouTube for Documents

Just a quick post to introduce readers to Scribd, a YouTube for documents.

Let's explore that analogy. Before YouTube, videos were hard to share because they were too big to email and because there were various yucky formatting issues to resolve. YouTube solved this practical problem by letting you share videos via their site (instead of email) and they transparently dealt with all the yucky formatting issues.

I must admit that, while not an active video person at all, I have used YouTube to overcome a simple problem. We use PCs at home, while at school the kids use Macs. We needed to get a video that we produced on our home PC to my son's teacher. After failing about 5 times to get a CD/DVD with the right formatting, I just uploaded it to YouTube and let the teacher watch it from there. Nice. (This was a while back when YouTube was still pretty new and the idea wasn't so obvious.)

In general, these assertions are not true about documents. Size-wise, I can email about 95% of the documents that I use every day. Formatting-wise, about 95%+ of what I use is either Office or PDF.

And there is less, for lack of a better term, voyeuristic tendency to want to read other peoples' documents than there is to see their images on Flickr or watch their videos on YouTube. And documents, much as I love them, aren't on the brink of changing the entire television industry, either.

But there are some documents -- the really big ones -- that bounce on email. Scribd solves that practical problem. And once you've uploaded documents, why not do all the various and sundry social networking things to them (e.g., tag, digg) and why not make new document friends, just like we have work friends (LinkedIn), friend friends (MySpace), tweet friends (Twitter), photo friends (Flickr), and blog friends (blogrolls).

Will Scribd sell for $1.6B like YouTube did? It's hard to believe. But it looks like they're about to raise money at a valuation north of $10M. See this TechCrunch article for more.

Addition: VentureBeat has good coverage of Scribd here.

Wednesday, April 25, 2007

Twitter: Call Me A Skeptic

You may have heard about a startup called Twitter which has been generating quite some buzz lately.

Twitter is a real-time blogging platform, a la Blogger, but it seems to be predicated on the notion that blog posts are too heavy. So instead of "posts," you have "tweets" -- typically single-line thoughts that say what you're doing (e.g., "I just took Jimmy's temperature") or how you're feeling (e.g., "It's beautiful here in Paris today") at any moment.

Who would read such drivel? I don't know. But I just spent over an hour tweeting and reading tweets.

It's interesting to look at the progression of personal publishing over time. Since the early days of the web, you could always make your own website. But it was hard work. You had to learn to use web publishing tools, pay for an URL, and pay for hosting. Some people did it, but it wasn't mainstream. What's more, making a "site" felt heavy -- a whole site of web content ... just about me?

Then blogging platforms came along. Suddenly anyone, for free, could start publishing their thoughts to the world. It wasn't a whole site, it was just a blog. A series of thoughts, a profile, and a blogroll. No big deal. Blogging, as we all know, exploded -- there are some 70M blogs out there today.

I think that blog platforms enabled personal publishing for a number of reasons:
  • They provided easy publishing tools
  • They eliminated cost for basic services
  • They allowed you to merchandise your blog by easily adding advertising
  • They provided an easy way to comment on someone's work (via permalinks, comments, and cross-posts)
  • And blogs lowered the bar on implied professionalism
The last point is critical because blogs established a new mentality, a new social contract if you will, in publishing. That new contract became roughly:
Hey, it's a blog. It's a stream-of-consciousness. It's not a newspaper article that needs fact checking. It's not a data sheet where you're printing 10,000 copies. It's a quick thought. If there's a mistake you can fix it, or people will correct you in the comments. If you're not sure about something, then just ask the audience. They can answer in the comments. Just get your thoughts out and get them out fast. We'll trade timeliness for some degree of perfection.
That psychology change was huge. I can do a blog post in 30 to 60 minutes, about as long as it might take to do the first draft of an article or data sheet of equivalent length. With the blog post, I then hit "publish" and I'm done. With an article or data sheet, I instead begin a review and layout process that takes weeks to complete. For some types of content that process adds value and is necessary. What blogs have taught me, however, is that for many types of content, it's not.

In short, for many publishing situations, the one-shot nature of print and the desire to avoid mistakes in print caused society to make a huge investment along a very flat diminishing marginal returns curve.

So, as my 150+ posts to this blog and it's thousands of readers will attest, I'm a convert to blogging.

But tweeting? I'm not so sure.

For example, I just scraped the following off the "public timeline" on twitter. Let's see what we got.
061 cmack007 Went Jogging Around The Block Earlier Today! There Are Stray Cats Everywhere In My Neighborhood! less than 5 seconds ago from web Icon_star_empty
Underdogfn7 Tenric whewww, had to retract a bid on ebay. Guy wasn't answering my emails and I was worried he was going to screw me. less than 10 seconds ago from web Icon_star_empty
Jeff jeffsand stuck on hold less than 10 seconds ago from web Icon_star_empty
Me_telefono_enh dhinus crea hype http://twittermi.tumblr.com/ less than 10 seconds ago from web Icon_star_empty
23-09-06_01502 elboby ya me voy a ir faltan 15 minutios...! less than 10 seconds ago from web Icon_star_empty
Vinyl brlamb qwertyuiopasdfghjklzxcvbnm less than 10 seconds ago from web Icon_star_empty
Sachiko_f head_s ありゃ、結構早くランキングをブックマークしたのにランキング入りしていない。 less than 10 seconds ago from im Icon_star_empty
Untitled-1 amiry خسته ام مثل همیشه و کارهای انجام نشده را می شمارم less than 20 seconds ago from web Icon_star_empty
Disney_with_kids jeffmarshall @ nfl network presentation - prime time coming on stage in a min less than 20 seconds ago from txt Icon_star_empty
Mac MacMakz いつもより1本早い電車に乗れた。今日も電車のなかは暑い。 less than 20 seconds ago from web Icon_star_empty
Picture_16 Soraya_av Noche "insomnia" less than 20 seconds ago from web Icon_star_empty
Annie anniemole Batty Tube Boots: Hanging around for the London UndergroundIf ever you feel the need to hang upside down.. http://tinyurl.com/2bv6o2 less than 20 seconds ago from web Icon_star_empty
Loek loekessers nieuwe blog, wilde plannen voor overkoepelende site SvJ, oude stempel docenten omlullen... en verder dromen over belangrijkere zaken less than 20 seconds ago from web Icon_star_empty
The first problem is that people tweet in their own language. Perhaps I was unlucky, but it looks like I picked up some Arabic, Dutch, Spanish, what's probably Japanese but I didn't have the character set, some just plain Gibberish, and a few posts in English. And that was in 13 tweets.

The other problem is that the content is just not inherently that interesting. Some woman can't sleep. Some guy is on the Tube in London. Someone else went jogging today. Someone else had to retract a bid on eBay.

OK. But ... so what? If you think that blogs are only read by their authors' mothers and friends ... well, let's not even talk about tweets. But, that said, there are lots of friends and mothers out there, as any of the now-innumerable social networking startups will attest. (Read Mashable or look at the EconSM conference if you want to feel just buried in social networking companies.)

So I can certainly see tweeting as an IM alternative useful for real-time friend and family group communication, and I can see tweets easily and gratuitously added as panes on a blog or a MySpace site (which Twitter seems to make quite easy to do). Basically, I think that while Twitter mashes well, it strikes me as a feature, not a company.

Oh, and just as I say this, I discover my first Twitter-based mashup: Twittervision. Check it out.

And if I'm lukewarm on Twitter thus far, let me refer you to this post for a more negative viewpoint (RIP Twitter 2007-2007) and to this post (Why Twitter is Kicking So Much Ass) for a more positive one, and to this post (The Asymptomatic Twitter Curve) for an interesting piece on Twitter, interruptions, and the general problem of continuous partial attention.

Monday, April 23, 2007

Autonomy Hires Fast's Convera Guys. Huh?

Things are sure crazy in enterprise search.
  • First, Verity forgets to invest in product innovation for several years, leaving themselves open to a general market-share assault and subsequent acquisition by Autonomy -- a company less than one-half Verity's size at the time they acquired them. That's rare. (See here for more.)
  • Then Convera decides that the only thing that it knows how to do (sell search inside government) isn't worth doing and, in response, amazingly sells off the part of their business that accounts for 93% of their revenues. That's rarer. (See this post: Honey, I Shrunk the Company).
  • Then Fast announces its intention to acquire Convera's $2.6M/quarter Retrievalware business for $23M. Paying 2.3x revenues for a business shrinking at nearly 30% is pretty rare, too. Normally, using my rules of thumb, a flat $2.6M/quarter business might be worth $10M (i.e., 1x revenues). A shrinking one might be worth half that.
If things work out as it appears:
  • Fast will end up with Convera's technology
  • Autonomy will end up with Convera's people
Since it's hard to support the technology without the people (see my post on the Oracle/SAP lawsuit here), and since neither company is US-owned, that should make Convera's largely defense and intelligence customers pretty sketchy on the whole affaire.

Combine this chaos with:
  • The Government's desire to use XML as an open and standard format.
  • The Intelligence Community's desire to use XML enrichment technologies to create richer and richer markup
  • XQuery's ability to express powerful queries in a high-level fashion against that markup
  • MarkLogic's ability to process complex XQueries against large contentbases with high-performance
And all roads seem to point in MarkLogic's direction.

Tuesday, April 17, 2007

The CMS of Reference

One of the nice things about living in content-land but coming from data-land is that I can spot interesting parallels between the two worlds.

For example, in data-land one of the big, seemingly simple, questions was always: how do we keep a customer's contact information (e.g., name, address, phone, email) correct across multiple redundant systems, such as finance, technical support, and marketing?

Should we:
  • Allow updates in only one system and propagate those each night to the other systems?
  • Keep the information in one system and pull it into the others in real time?
  • Keep redundant synchronized copies in all systems and replicate changes in real time?
  • Keep a copy-of-reference in none of the relevant operational systems, but instead in a separate system (e.g., the data warehouse) and pull it from there to all the operational systems?
Obviously, there are trade-offs involved in all such approaches. For example, the first approach means a call-center operator would need login credentials to the marketing system to make updates (if it were the system of reference) and there would be a lag in getting updates across all systems. The third approach generates a lot of network traffic and the possibility of conflicting updates hitting from different systems. The last creates yet-another system to serve as the master rather than simply picking one of them.

In my experience, pragmatists picked the first approach and visionaries picked either approach 3 or 4.

Carry this dilemma over to content-land and you have a very similar situation. Instead of "where's my copy of reference" for data, it's "where's my copy of reference" for content?

Historically, it was clear that print ruled the publishing world and that production processes were designed for print and everything else was secondary or derivative. For example, many newspapers produce their online product by scanning and converting a PDF of the print product. That is, so many changes are made downstream of the editorial process that the only way to get the final version of an article is to scan it in off the paper.

Lisa Bos at Really Strategies recently wrote an interesting post on this topic that you can find here. Lisa talks about print CMS, web CMS, and single-source CMS as the three primary options. That maps in data-land to roughly make the SFA the copy of reference, make the call center system the copy of reference, or make the data warehouse the copy of reference.

My belief going forward is that pragmatic publishers will be "web first" -- i.e., the system of reference will be web delivery and other delivery channels will be secondary. This is still a massive mind-shift for many publishers, particularly in layout-intensive segments like newspapers, magazines, and textbooks. MarkLogic can play an important role here because even with a "web first" mentality, I think there is a strong argument to delivering content to the website from an XML content server, instead of an HTML-based web CMS.

Visionaries will make a "content warehouse" in an XML repository (managed by an XML content server like MarkLogic) and do single-source delivery from that repository.

I'll try to post more on the "why build your content-driven website directly in XQuery on MarkLogic" topic later. Meantime, check out Lisa's post.

Friday, April 06, 2007

O'Reilly Again Demonstrates XQuery Flexibility

Here's a post on O'Reilly Radar, by none other than Tim, that discusses some new content preview features that will be available on their site, made possible by none other than MarkLogic. Here's an excerpt from Tim's post:
The Table of Contents link on the catalog page for each book now lets you preview top level sections in each chapter, as shown in the screen shot here.

The previewing ability we've added is driven by an internal application we refer to as the content "deli," an XQuery database that allows us to serve up all our content in alternate forms. The RightLink preview implementation is the first of many new features we expect to be rolling out now that we have this infrastructure in place. (It's a generalization of the tools we built for SafariU last year.)

Not only has O'Reilly built a very cool app based on MarkLogic (SafariU), but they continue to demonstrate the power and flexibility that publishers get when they (1) centralized their content in an XML repository, and (2) use an XML content server, like MarkLogic, to build content applications on top of it.

These new features are one example. O'Reilly Labs is another. Check it out.

The Real Estate Roller Coaster -- On Video

I found this via the Infectious Greed blog and while it's a bit off-topic for my blog, I thought I'd include it for two reasons:
  • It's very cool
  • I like this technique, effectively riding a graph as a roller coaster, as a means of data visualization. I think it's very creative, and wow is it high impact.
By the way, I'd bet $10 that the roller coaster gets closer to the ground pretty soon, based soley upon the ride.

Enjoy it -- it takes 4 minutes. Keep an eye in the lower right hand corner to see the year flash by every now and then.

Wednesday, April 04, 2007

Honey, I Shrunk the Company: Convera Sells Retrievalware to Fast

Two days ago, Norwegian enterprise search vendor Fast Search and Transfer announced an agreement to purchase the Retrievalware business from Convera for $23M. You can find the press release here.

Let's try to understand what this means.

First, some background on Convera. Technically, Convera is a seven-year old company created through the combination of Excalibur Technologies and Intel's Interactive Media Services division. I'd always thought of Convera as the re-branding and reincarnation of Excalibur, a search company that has been around for over 20 years. Convera always struck me as a company that historically did well in Federal government (e.g., defense, intelligence), but that never appreciated its own strengths.

Financially, Convera has not done well. For example, in its most recent quarter, 4Q07 (FY ends on 1/31), Convera reported total revenues of $2.8M, down 24% from 4Q06, and a net loss of $9.7M. Retrievalware revenues in 4Q07 totaled $2.6M, down 27% from 4Q06. Looking over the longer term, the FY06 10-K, shows on page 23 that annual revenues have monotonically decreased since 2004, descending from $29.3M to $25.M to $21.0M and, per this press release, to $16.7M in 2007, reducing the company nearly by half over the past 4 years.

I'd occasionally joked that it was perhaps appropriate that the company's headquarters were on Gallows Road.

Convera has some quirkiness it its history, detailed in this Washington Post story. I'd guess that one reason Convera has not been content simply to be a Federal play is that Herb Allen is a medial mogul, running an exclusive conference in Sun Valley, and arguably the premier investment house in media and entertainment. Hey, when you're on the Forbes Billionaire List already, why mess around with a Federal play when, with luck, you might convert it to the next Google, and without luck you lose what amounts to a rounding error? When billionaires play, it's rarely to make pocket change and it's usually for keeps.

This is speculation on my part, but my guess is that Allen's involvement is what accounts for Convera's schizophrenic past, as evidenced by this graphic that I took off their homepage today.


To me, Convera is one small ($10M run-rate), shrinking company with two strategies: vertical search platform and enterprise search engine. Or, I should say, was.

After this deal, it seems that Convera becomes a tiny ($800K run-rate) company with one strategy. While it's hard to believe -- and I've had to check the figures a few times to do so -- Convera seems to have sold the business that accounts for 93% of their revenues. While I might question their wisdom or sanity, I certainly can't fault them on commitment.

Let's flip over to the Fast side of the equation.

Since no MBA who passed quant class would pay $23M for a $10M business shrinking at 24%, there needs to be more going on here. In this IWR blog post, CEO John Lervik says that the deal helps Fast in "aiming at the lucrative government market," which this InformationWeek story says accounts for about 70% of the acquired business.

That's consistent with Fast's recent comments about tactical acquisitions, and I suppose the business argument is that they can try to sell their search technology to the Retrievalware installed base. The success of that strategy will depend on a number of variables:
  • Have Retrievalware customers already and long-ago found alternative paths forward?
  • Are those that remain customers merely interested in keeping existing systems running?
  • Is enterprise search technology the appropriate replacement technology?
  • Will government customers, particularly in the sensitive defense and intelligence sectors where Convera did much of its work, be comfortable buying from foreign suppliers? [See note below.]
In our experience, particularly in Federal government, XML content servers are often a better replacement technology than contemporary search engines. That's because (1) government likes XML as a storage format since it's open and standard, (2) the ability in XQuery to express arbitrarily complex queries, (3) the ability to easily hook a series of best-of-breed extraction / enrichment tools together in an open architecture, and (4) government contentbases are often massive in scale and require the ability to run very complex queries against very large contentbases with high performance.

The last point requires obeying "rule 1" of database performance, which troubles search engines because, compared to XML content servers, they have a limited ability to push constraints to data.

As for Convera's vertical search platform strategy, I'll say one thing: they have most definitely burned the ships on landing in the New World.

Time will tell whether they go on to greatness or get eaten by the natives. Either way, there's no going back now.

# # #

Note: I do not claim definitive expertise on whether the US government or sectors of it can or should buy software from US or foreign suppliers. While I do know that the Buy American Act exists, it seems to exclude software in section 25.103 (e). Despite that, I often hear that there are "issues" with foreign suppliers in the more sensitive sectors of government and I would welcome email pointing me to relevant regulations. Meantime, I have disabled comments on this post to avoid repeating a problem I had in the past with what I suspect were competitors testifying anonymously and anecdotally to the contrary. Since it's my blog, I will share my opinion based on the people I've asked this question. Please feel free to send me information (e.g., links to regulations) so I can learn more.

See the FAQ for information on my comment policy.

Tuesday, April 03, 2007

CQ Leads Government Technology Story

I wanted highlight this story in Government Technology that features Mark Logic customer Congressional Quarterly (CQ).

One of my new memes is that just as the relational database enabled two large secondary markets (i.e., business applications, business analytics) so will XML content servers (such as MarkLogic) drive the creation of two huge secondary markets (i.e., content applications, content analytics).

As it turns out, most of our publishing customers build content applications (e.g., Elsevier's PathConsult) and most of our government customers do content analytics.

CQ lives at the intersection of our two largest markets -- they are a publisher that covers government -- and they have built a very interesting content analytic application called CQ Legislative Impact.

This story, entitled X Factor, is primarily about XQuery, the query language that MarkLogic Server natively speaks.

It leads off with an interview of CQ's senior software architect, Hank Hoffman:
It's one thing to compile hearing dates, vote counts and committee actions, but it's quite something else to make those data points relate meaningfully to one another. A year ago, Hoffman found what he was looking for in the form of XQuery [...]
"You can do some very powerful things with just a very few lines of code," Hoffman said, explaining that XQuery makes interpreting and managing masses of XML data a much simpler proposition [...]

Other snippets include:

"If all you have is relational data, and you want to create tables, SQL is a great language. The problem is that the game has changed," said Jonathan Robie, XQuery technology lead and chief scientist at Massachusetts-based DataDirect Technologies [...]

He's referring to the recent rise of XML as the predominant language driving the Internet and data storage in general -- an evolution that has pushed demand for tools to query and manage XML data. That's where XQuery comes in. [...] Several companies, including Microsoft, IBM, MarkLogic and Saxonica, have moved to commercialize XQuery with diverse tools aimed at easing its implementation.
[...]
Users say it's relatively easy to acquire a fluency in XQuery basics. Harvey turned to the language to develop an interactive dictionary, and recalls boarding a train not knowing a thing about it. "By the time I took the train to New York, had a meeting and took the train back," she said, "I had a working product that I could give to my client. If you are familiar with XML and XML technologies, it is not that hard to work with."

Netezza Files To Go Public

On March 22nd, Netezza announced that it had filed a form S-1 with the SEC regarding a proposed initial public offering (IPO) of its common stock. You can find Netezza's registration document here.

This is great news for Netezza, and I'd argue, for special-purpose DBMS companies in general. As I've often said in this blog (e.g., Pimp My Ride; What's a Column-Oriented Database; Half-Man, Half-Machine, All Cop; MarkLogic: DBMS or search engine), I'm a big believer in the future of special-purpose database management systems.

Netezza sells a special-purpose data warehouse appliance, the Netezza Performance Server. The sales pitch is all about performance and, if you're interested, you can find a nice collection of technical white papers here, as well as an interesting web page that describes their Intelligent Query Streaming, that makes queries run at "physics speed" (pretty cool technical marketing).

Simply put, unlike a general-purpose DBMS that's supposed to do everything, Netezza sells an appliance that designed to do one thing very well: data warehouse queries at large scale with high speed.

Per figures from their S-1,
  • Netezza has grown revenues from $13.6M in their FY04 (which ends 1/31/04) to $79.6M in FY07, representing an 80% compound annual growth rate over that period.
  • FY07 revenue grew 48% over FY06, which was $53.9M
  • FY07 gross profit was $47.5M, giving them 60% gross margins.
  • If I de-appliance their numbers by dividing $47.5M by 0.8 to reflect a typical software pure-play's 80% gross margins, that implies an equivalent pure software play size of $59.4M.
  • They ended FY07 with 225 employees and 87 customers.
  • Somewhat surprisingly, they posted a net loss of $8M in their most recent fiscal year. (I'd have guessed that you needed to be around their size with ~50% growth -- and profitability-- to go public in this day and age.)
  • They have a large R&D investment of 22% of sales. However, analytically, I'd argue that appliance companies should measure R&D investment as a either (1) percent of equivalent pure-play software company size (a Kellogg metric if there ever was one), which works out to 30%, or (2) as a percent of product gross margin, which works out to 47%. Wow.
  • If I'm correctly reading the prospectus, they have raised a whopping total of $97M in venture capital across 4 investment rounds.
  • Their run-rate figures (4x the most recent quarter) are $106M in revenues, a 1.8M net loss, and R&D investment of 18% of sales.
  • They like owning IP: they have 8 US patents issued and 14 pending.
  • Mark Logic investor Sequoia Capital, rarely one to be left out of an exciting company, owns about 15% of the stock on a pre-offering basis.
  • They have requested the ticker symbol: NTZA
Disclaimer: I'm not a financial analyst and don't pretend to be one; I don't recommend buying or selling stocks.