Monday, July 30, 2007

Comcasted: UGC Meets Poor Customer Service

User-generated content (UGC) continues to be the talk of the town in the new media business, but perhaps other industries should start paying more attention as well. Just as angry customers and ex-employees instantly put the web itself to good use about ten years ago (with expletive-company.com sites), so now are people using YouTube for the same, if less profane, purposes.

Maybe this is old news for you, but somehow I'd missed it. So it came as a surprise to me when a friend of mine who works in telecom mailed me this video featuring a sleeping Comcast technician:



What's more surprising is that there are more of them. This one features TV news clips about a sleeping technician, and ends sadly with vandalism of the customer's house and poisoning of their pets:




Now, you have even have derivative videos, which discuss other web customer service videos, in this case getting the call center rep to admit that he's now been trained to avoid saying things that would be embarrassing for Comcast if recorded and put on the web. (Sorry, but it's only marginally funny.)

More on Fast's "Extensive" Profit Warning

See this Forbes story for more information on the situation at Fast Search & Transfer. The shortfall seems bigger than I previously posted:
  • The 2Q07 revenue guidance I cited was evidently the bottom of a consensus range of $50M to $60M (see below)
  • The company's new guidance is $34M to $38M
  • The company's 2Q06 revenues were $38.5M (though the SEB Enskilda analyst says that "historical revenue figures are now called into question," seeming to suggest a possible restatement.)
Quotes from the Forbes story:

Fast Search & Transfer fell 0.3 to 10.45 [NOK] as analysts got to grips with the implications of the internet search firm's extensive profit warning last Friday.

Last Friday shares in Fast slumped almost 30 pct after it warned that its second-quarter results were unlikely to meet market expectations, with sales coming in at 34-38 mln usd compared to what analysts say is a consensus forecast of 50-60 mln.

The group blamed 'changes in business practice' and the tightening of internal control procedures, along with lower sales, a rise in exceptional items and the need for an increased provision for bad debt.

Analysts had already expressed serious concerns about Fast's accounting procedures, and particularly the aggressive methods of recognising revenues. However today they said Fast's profit warning had been even more severe than had been expected.

[...]

The broker said that despite previous warnings over margins, strong top-line growth had 'continued to support the investment case'. SEB said last week's warning had 'called the historical revenue figures into question'.

'Its profit warning was more extensive than our concerns about its aggressive revenue recognition and receivables write-downs,' the broker said, arguing that the firm's cost base is 'clearly geared to significantly higher revenues than the company expects'.

[...]


See my FAQ for disclaimers.

17 Search Innovations Outside Google

I was digging through my bloglines bookmarks today and found this old (for a blog) post on the Red/WriteWeb written by Nitin Karandikar, author of the Software Abstractions blog.

The post, entitled Top 17 Search Innovations Outside Google, provides a great round-up of recent search innovations. It was posted in May. I'm guess I'm better late than never in sharing it.

Here's the summarized list:

  • Natural language processing
  • Personal relevance
  • Canned, specialized searches
  • New content types
  • Restricted data sources
  • Domain-specific search (aka, vertical search)
  • Parametric search
  • Social input
  • Human input
  • Semantic search
  • Discovery support
  • Classification, tag clouds, and clustering
  • Results visualization
  • Results refinement and filters
  • Results platforms
  • Related services
  • Search agent

Gazhoo: A Content Marketplace

Check out this site, Gazhoo, which is something akin to Scribd meets eBay. Scribd in the sense that it's for grass-roots document sharing. eBay in the sense that commerce is built-in, via PayPal.

The trouble with such sites is that they need content to be useful; right now Gazhoo looks pretty empty, suggesting a premature launch. For example, "lymphoma" found one hit, a term paper on AIDS. "Living trust" had lots of hits, but none of them resembling what I was looking for (a generic template for defining a living trust). Surprisingly, "Google" scored but one hit. "Black Scholes" scored lots of irrelevant hits; when refined using their faceted navigation to the "business" cluster, it produced 3 hits, none of them relevant.

My prediction is the killer app for Gazhoo will be term papers. In fact, perhaps Gazhoo is a somewhat disguised site intended for that very purpose. If you look at the top-level taxonomy of documents, it goes:
  • Business forms
  • Legal forms
  • Market research
  • Reports/essays

No matter the intent, I think Gazhoo and Scribd are both worth watching. Here's what Mashable had to say about Gazhoo.

Friday, July 27, 2007

Fast Search & Transfer Warns Big Time

See this story where Norwegian enterprise search vendor, Fast Search & Transfer, makes a significant (~$15M revenue shortfall) 2Q07 earnings warning:

Fast Search & Transfer closed 4.25 nkr lower at 10.75 after the search platform specialist warned its second-quarter results were unlikely to meet market expectations, with sales coming in at between 34-38 mln usd compared to what analysts say is a consensus forecast in the region of 50 mln.

The group said 'changes in business practice' and the tightening of internal control procedures have had an adverse impact on second-quarter revenues.

'We believe that these changes had an impact of 10 mln usd when compared to the first-quarter results,' it said. 'In addition, a shortfall in expected sales revenues has had a further adverse impact of about 5 mln usd in the quarter.'

On top of this, Fast said the second quarter has seen exceptional items worth a combined 5 mln usd, which means expenses in the quarter are also going to come in 'higher than market expectations'.

Additionally, Fast said it expects to make an increased provision for bad debt 'in excess of the 6 mln usd previously communicated'. Fast is scheduled to release it second quarter results on Aug 8.

For a long time, I've felt that Fast was the MicroStrategy of enterprise search; so concerned with showing high growth that it -- how do I put this nicely -- let the basics slide. Frankly, I didn't draw that conclusion all by myself; some of it came from reading a few financial analyst reports about them, other information came from discussions with former employees who described the culture in terms that reminded me of MicroStrategy, and some came from my general sense that Fast was spread too thin, with too many initiatives for a company their size.

By my math, this means the "high growth" vendor in enterprise search is actually shrinking. Per the 2Q06 earnings release, revenues in 2Q06 were $38.5M. Their new 2Q07 guidance is $34 to $38M, so they are shrinking somewhere between 1% and 12%.

If you think financials don't matter when it comes to market perception, think again. And I'm not talking about the Norwegian stock market, I'm talking about the customer market. As a customer, or industry analyst / consultant who advises them, tell me how you would feel about a supplier who is:
  • 70% of the leader's size, profitable, and growing at 50%
Versus:
  • Less than half the leader's size, unprofitable, and shrinking
It makes quite a difference, doesn't it?

While most industry analysts would not admit it, they look a lot at reported vendor financial results in determining their opinion of a company, the effectiveness of its strategies, and ultimately things like its placement on a magic quadrant or equivalent diagram.

I'd note (and I'd guess the timing of all this is not a coincidence) that Fast announced the appointment a new president and COO this week, Joseph Krivickas.

Velkomen, Joe. I hope you brought a mop.

Disclaimers:
  • We compete with Fast, although indirectly
  • I am a CEO analyzing the market, not a financial analyst analyzing stocks, and I do not make buy/sell recommendations
  • See my FAQ for more

The 2007 Web Trend Map, Version 2.0

Check this out. It's the top 200 sites on the web, visualized as a subway map, where the lines are trends and the stations show supplemental information, such as the momentum of the site.

It's very creative and quite interesting; it's produced by a Japanese firm called Information Architects. Go here for IA's blog post on this. Go here for my favorite version, the interactive clickable one.

Thursday, July 26, 2007

The Google - Autonomy Spat

Check out this InformationWeek story that describes a spat between Google and Autonomy over a white paper that Autonomy released a while back, which they now say is outdated.

My favorite quote from the Autonomy marketing VP:

"It's basically irrelevant because we see them in less than 1% of all deals."

Ah, the old "we never see them" line. It's often used, and often dangerous. Let's think about it a bit. A recent Outsell report estimates Google's enterprise search business at $350M in 2006, compared to Autonomy's $250M in 2006. Practically overnight, and with very little sales and marketing energy, the Google Appliance has become the #1 enterprise search solution, with a commanding 40% lead over the nearest competitor.

And yet, somehow, Autonomy only sees them in 1% of deals. How can that be? Obviously, Autonomy's trying to position themselves in the high-end market, and Google undoubtedly in the "commodity" space down below.

However brave the marketing, I doubt the statement. It's probably more like: "of the remaining deals -- excluding the ones that we no longer pursue with the Ultraseek product line that we (i.e., Verity) formerly positioned as a low-end search solution -- the ones where we correctly know it's futile to compete against Google -- excluding those deals -- we only see them 1% of the time."

Now, I'm not recommending that Autonomy launch a frontal assault on Google. They, and the other enterprise search vendors, were smart to abandon the low-end of the market when Google entered. But the question is: what next? Is there enough high-end to feed them in future? And is anyone else going after that high end, and from where?

Autonomy's recent results aren't bad. They grew 20% in 2Q07 over 2Q06, but they're losing share to Fast who grew 50% in their most recently announced quarter (1Q07). But I'd bet Google is growing at more like 200% than 20%.

This is why always I say the enterprise search market is stuck between a rock and a hard place.
  • The rock is the Google Appliance which is quickly establishing itself as the dominant enterprise search solution.
  • The hard place is database management systems (including both XML content severs like MarkLogic and the major RDBMSs which are slowly improving their content abilities) that provide platforms for enterprise application development.

From Search to Research ... and Content Applications

Here's an interesting post on the Read/WriteWeb (RWW) blog, entitled From Search to (Re)Search, Searching for the Google Killer.

It's definitely worth reading, and the links within it, like this one where a Hakia guy explains quite articulately why Google is unstoppable, and then unsuccessfully tries to dismiss his own arguments.

I agree with RWW that the Google killer won't come from:
  • One-up feature companies. Engines like Clusty, which add one feature (e.g., dynamic clustering) on top of Google search.
  • Vertical search companies. While the long tail is real, I don't believe there will be a long tail of search engines (that's the inverse of the concept). People want relatively few tools that can reach into the long tail of content, products, and information. They don't want a long tail of tools.
  • Human search. Unless you're doing real research, the cost model is prohibitive.
The first time I heard the phrase "research, not search" was from Nerac CEO Kevin Bouley. Kevin's company provides custom research services using a database of content integrated from numerous sources combined with a network of subject-matter experts (SMEs) who use an MarkLogic-based application to assemble custom research reports for clients. When Kevin says "research, not search" he means it.

Nerac uses MarkLogic as their content repository and have a built an XQuery application that enables SMEs to quickly locate information (using our XML search capabilities) and then combine and package that information into a custom research report. It's a very cool service, and while I think of it as "research, not search" I certainly don't think of it as human-powered search a la Cha Cha.

While I believe that from search to research is a good direction, I think there is another equally important direction that the RWW omits: from search to application, or as we say at Mark Logic "content application."

To me, search is inherently open-ended and context-free. Applications are not. If I know you're a professor and you want to build a custom textbook, then I can build an application that helps you do that. And yes, that application will probably include search across a corpus of content. But search is a feature in the application, not the application itself.

Or, if you're a pathologist, I can build you an application that leverages how you work with terabytes of medical content to help you identify cancers more readily. Search might be a feature within that application, but the application itself is about helping support the process of differential diagnosis.

Content applications know who you are and what you're trying to do. (They're role and task aware.) And you can build them on MarkLogic. And, in my mind, only a content application has enough unfair competitive advantage to beat Google over time. A thin vertical search layer? A better algorithm? One sexy feature? No.

But an application that knows who you are, what you're trying to to, and leverages a rich (potentially integrated and enriched) contentbase to do so? Ah, well that's no fair. No search engine can do that.

And that's the point.

Tuesday, July 24, 2007

New Digs For Mark Logic

I'm happy to announce that, as of yesterday, Mark Logic has moved to a bigger (and nicer) office space in San Carlos, California.

We're about 7 miles south of our old location in San Mateo, just off Silicon Valley's central artery, highway 101.

Our new address is:

Mark Logic Corporation
999 Skyway Road, Suite 200
San Carlos, CA 94070

Our phone number remains the same at +1 650-655-2300.

I'd like to give great thanks to our operations team, including Josh Narva, Jeff Thomas, Lisa Ross, and Angela Schenone for executing a seamless transition between the buildings.

Monday, July 23, 2007

PR Oddities on the EMC X-Hive Deal

Perhaps it's because I ran high-tech marketing departments for years before joining Mark Logic that I'm abnormally attuned to the corporate news roll-out process, but I'll say that I'm finding EMC's PR around (what I'll now have to call the rumored) acquisition of X-Hive quite strange.

Look at the timeline:
  • The first story breaks Friday 7/20 on eWeek, here. This is not a leak-type story. It's a complete, standard news story with an EMC spokesperson, an analyst (who's presumably been briefed prior to the interview), and an X-Hive spokesperson all quoted.
  • My Google alert catches it on Saturday morning UK time. Thinking I've got a timing scoop, I'm worried that the deal will already be official by the time my flight lands in SFO. (What Google giveth, United taketh away?) My inner journalist wants to blog before I board, but I run out of time.
  • Before boarding, however, I find and read two other stories: the CMS Watch story here, and the Gilbane story, here.
  • I blog my first post on it, here, on Saturday 7/21 around 2:00 PM. By that time, I'm amazed that I can't find a press release from either company nor any trace of the deal on either company's website. I figure perhaps eWeek messed up an embargo -- but then did CMS Watch and Gilbane do the same thing? One seems possible; three seems improbable.
  • While the size of the transaction (which I'd guestimated here as between $25M and $50M) means it's probably financially immaterial in a $3B/quarter company, I figured they'd launch Monday before the markets opened. (It's good PR hygiene to launch corporate news when the markets are closed, imho.)
  • It's 7:55 PM California time on 7/23 -- three days after the eWeek story -- and there's still no news release nor trace of the deal on either website.
I have one observation and two theories.

The observation is for marketers: this a case study in how you don't want to roll-out news. Fortunately, for EMC it's a tiny transaction in a corner of the company, so there's no real egg on the corporate face. But if something like this happened on a major launch, I'd bet that some marketing heads would roll.

Now, the theories. Either (1) they're planning on announcing this on the EMC quarterly conference call at 830 AM Eastern tomorrow (register here) and somehow the PR and IR teams got de-synchronized. Or (2) somehow marketing got out ahead of the deal finalizing and perhaps something's gone wrong on that front. I'm betting on (1) but you never know.

This post is really about the launch of the deal, on the assumption it's happening. If you want my thoughts on the deal itself, see the post I did Saturday.

Update: EMC's 2Q07 earnings release is out and there's no trace of the deal in it, nor any separate news releases on the topic. The plot thickens.

Saturday, July 21, 2007

EMC Goes Dutch

In what appears to be a bungled press launch, EMC (owner of Documentum) has announced the acquisition of Rotterdam based XML vendor, X-Hive.

Why do I say bungled? Because one of my Google alerts caught this eWeek story on Saturday before I could find an official press release on either companies’ website or the wire services. And I found stories on Gilbane and CMS Watch announcing the deal as well. (And it's now end of day Monday and I've still not seen any official indication of this on either site.)

First, let’s look at the numbers. The terms of the deal were not disclosed, so we don’t have much to work with, but they did say that X-Hive currently employs 25 people. With that, plus some standard ratios and basic math, we can work up a valuation estimate.

Assuming sales/employee/year in the broad range of $200K to $300K yields annual revenues of $5.0M to $7.5M. Since X-Hive is 11 years old and employs 25 people, it’s safe to assume the average growth rate has been quite low over the company’s history. But, let’s be charitable and assume that they were getting some traction with their recent s1000d initiative, so let’s guess they were growing from at 25% to 50% over the past few years.

That, plus a look at EMC’s historical deals, suggests a valuation of 2 to 5 times annual revenues, implying a valuation range of $10M to $37.5M. Eyeball correcting that, and knowing the company is venture-backed, suggests to me a range of $25M to $50M. If I had to guess one number in the range, I’d say $35M.

Next, let’s analyze strategy. X-Hive has three primary product lines.

X-Hive/DB, an XML content server with built-in search capabilities, in the same category as MarkLogic

X-Hive/Docato, an XML content management system, in the same category as Vasont, Astoria, and XyEnterprise.

X-Hive/AMDS, an aviation document management system that I believe was built for Northwest Airlines, in the same rough category as offerings from Jouve/InfoTrust.

My strategic concern with X-Hive has always been focus. While the offerings are layered on each other, the reality is you have a 25-person company in the Netherlands conducting war on three fronts. All three categories in which the company competes are highly competitive, and X-Hive has approximately 8 people working per category. That strikes me as way below critical mass.

Perhaps controversially, I believe that X-Hive’s strategy in moving towards aviation was divergent from EMC’s interest. It’s well known that Documentum has poor XML handling. While X-Hive was heading off to aviation, I think that EMC was looking to improve its XML capabilities. But I think EMC has taken a more tactical than strategic approach.

Why? If you think about it, EMC has an interesting problem. While they have strong positions in the storage and ECM layers of the stack, they have no presence at the database layer, which is controlled by the MOI (Microsoft, Oracle, and IBM) oligopoly. What’s worse, the MOI are rising up from the database level and attacking Documentum on its home turf -- e.g., Microsoft’s increasing investment in SharePoint, Oracle’s purchase of Stellent, and IBM’s FileNet acquisition.

A creative strategy for EMC would be to play defense at the ECM layer by playing offense at the database layer (which in this context includes relational database, enterprise search, and content server technology), by integrating best-of-breed technologies at that level and then attacking with a strong unified data/content database story.

But I think EMC views XML the same way that Oracle viewed BI – as a tactical, tick-box item, and not as a strategic opportunity.

Let’s talk about that some more. As you may know, I was part of the executive team that took Business Objects from $30M in 1994 to nearly $1B in 2004 when I left to join Mark Logic. Over an approximately 15-year period, a $1B company was built directly underneath the nose of Oracle, one of the most viciously competitive companies in high-technology.

What’s more, Oracle had competing products (Reports, Discoverer) from day one. Business Objects was founded by a marketing director and a sales manager from Oracle France, after they unsuccessfully ran the idea up the corporate flagpole at Oracle, so the company was on Oracle radar from day one. Thus, I can derive the non-existence of Business Objects from first-principles quite easily. But – and here’s the catch – it does exist, and today it’s about a $1.5B company.

So how in the world did that happen? My take:

• Oracle never saw BI as strategic. For them and many other companies, “tool” was a four-letter word, and BI a tick-box category to be avoided. Consequently, Oracle’s best people never worked on BI.

• Oracle was distracted. Its repeated failings in the much larger applications (e.g., ERP, CRM) market were a constant source of distraction. There were always bigger fish to fry.

• The market structure lent itself to independents. Most customers had multiple DBMSs and ERP/CRM systems and wanted BI as a unifying layer across that underlying chaos – this lent credence to players who could credibly claim agnosticism across the lower layers.

The result? Oracle made disposable BI products that were good enough to throw-in free on a purchase order as a discounting alternative, but not good enough to be seriously considered by someone who viewed BI as strategic. In effect, Oracle skimmed the sludge from the bottom of the market, leaving the cream for vendors like Business Objects and Cognos.

That’s my belief for how things will work out with EMC, Documentum, and X-Hive. By taking this approach to both EMC’s database-layer problem and Documentum’s XML problem, they are (in my humble opinion) screaming “tick box” and not strategic.

Finally, in the event that I’ve gotten it wrong and EMC really does believe that they are going to attack Oracle, Microsoft, and IBM at the database layer with (1/3rd of) 25 folks in Holland, then I’d say that I think they’re tilting at windmills, if you'll pardon the pun.

Pronouncing Web 2.0

I’ve just returned from England, where the weather was wonderful, the fish and chips crisp, the Hendrick's gin cucumber-y, the food surprisingly delightful, the hand-drawn ales appropriately lukewarm, and the meeting I attended engaging.

Among other things on the trip, I’ve noticed that the Brits have a tendency to say “web two” as opposed to the American (dare I say original) “web two dot oh” (or alternatively, "web two point oh"). I believe, last I was in France, that they do the same thing en disant “web deux” et pas “web deux point zero.”

But with an American accent, saying “I’m hip; I ‘get’ web two” is antithetical. (Dude, you can’t even pronounce it!) Whereas, with a British accent, it’s fine. (One can only pity the ex-pat American who learns web-speak across the pond.)

Other than precluding a minor release of the new web (e.g., Web 2.1), it’s simply a local pronunciation difference. But it nevertheless reminded me of the numerous verbal hints that tip off either (1) one’s degree or hip-ness in the tech community or (2) simply where you’re from.

For example,

• The Brits said “kicks” and not C-I-C-S when referring to the grand-daddy of transaction monitors. Americans were split down the middle on this one.

• Saying S-Q-L instead of “sequel” used to be popular everywhere, but now people just say sequel. In addition, much to Microsoft’s presumed delight, they also increasingly just say “sequel” when they mean “sequel server,” referring to the generically named SQL Server DBMS.

• I’ve never heard “my S-Q-L” when speaking of to the open source MySQL (my-sequel).

• It’s definitely B-I for business intelligence and not “bi”.

• Never, when speaking of Business Objects, should one use the dreaded abbreviation B-O, as they call the company in France (which regrettably stands for “body odor” in America).

• It’s most definitely B-E-A and not “be-ah” for the large middleware firm, BEA.

• However, most Americans pronounce SAP as S-A-P, while Europeans usually say “sap”.

• But always E-R-P and never “erp”.

• In the “web two dot oh” vowel omission department, Flickr and Scribd are pronounced flicker and scribed, not “flick-R” and "scribbed”

What fun. If you’ve got examples, of either fun antithetical tech statements or simply, fun local pronunciation differences and/or asymmetries like those above, I’d love to hear them.

Friday, July 20, 2007

Medieval Technical Support Video

This has been around for a while, but it's so good (and has such a pointed lesson on change) that I figured I had to do a post on it.


Thursday, July 19, 2007

Is Blogging Dead?

I found an interesting post on the Read/WriteWeb blog, entitled "Is Blogging Dead?"

While blogs are certainly cooling off from a hype perspective, and I'm sure there is no shortage of two-reader blogs (the blogger and his mom) among the estimated 70M blogs online today, are blogs dead, or even dying? Frankly, the idea hadn't even occurred to me, much less the argument that social networking sites would be their demise.

Sure, I've setup accounts on numerous social networking sites (e.g., MySpace, Facebook, LinkedIn, Yelp, Twitter, and in the past, Classmates). While I know many people use social networking for posting blog entries (e.g., on MySpace it's done frequently), to me that's usually in a more personal context (e.g., "I'm going out to this club tonight") and not a professional one. Of all the 80+ RSS/Atom feeds that I read regularly for work purposes, not a single one comes off a social networking site.

So while I see a theoretical conflict, in practice, I don't see an actual one.

Here's an excerpt:

There are many other reasons, apart from being social, that people may want to blog. One is to focus on a niche and essentially treat it as a media website, which is what we do here on Read/WriteWeb.

This is exactly what I do on this blog as well -- treat it as media site; Dave's op-ed "column" on the Internet, if you will.

More:

Another reason is to join a distributed conversation about shared interests - usually a half social, half work activity. Newbie blogger Marc Andreessen's blog is probably of that type, as he wrote about today in his Eleven lessons learned about blogging, so far post.
I agree on the conversational aspect of blogs, but organically this blog has not generated much commentary. Perhaps it's my lecture-oriented style. (Could it be that I'm perceived as opinionated and wouldn't listen?) So while I think active commentary is a great plus on any blog, I don't think it's required for a blog's success (e.g., I've had over 10K visits/feed hits this month -- so people are reading, they're just not commenting). If anyone dares to say why, I'd love to hear about it.

I'd recommend reading the Andreessen "eleven lessons" post, especially if you're an aspiring blogger. And here's an interesting post that replies to Andreessen's thoughts on comments.

Finally, I discovered this post on blogging (Is Blogging Passe?) as well and wanted to comment on it -- but the site's gone down so I can't read it. Ergo for now, I'll just link to it and if I have interesting thoughts later once it's back up, then I'll return and revise this paragraph -- a capability, I'd add, which is just one of the many nice things about blogging!

Wednesday, July 18, 2007

Stonebraker's "One Size Fits All" Papers

As frequent readers know, one of my memes is the rise of special-purpose databases, whether they be data warehouse appliances like Netezza, stream databases like Streambase, or OLAP (aka multi-dimensional) databases like Essbase, recently purchased by Oracle through the Hyperion Acquisition.

I believe that MarkLogic is one of a class of special-purpose DBMSs that will be necessary to handle new requirements that were never envisioned when the RDBMS was born. The relational database is now pushing 40 years old since its invention (and pushing 30 since the first implementations in commercial products).

An easy way of seeing the problem is to think about the computers you used even 20 years ago, their disk and memory configuration, their network connection speed, the types of data they managed, and the applications they ran. For me, that would be a 1 MIPS MicroVAX II with 8MB of memory, 256 MB of disk space, 40 users (among other things I was the sysadmin), and we used it to run a technical support call tracking system at Ingres, then known as Relational Technology, Inc.

While RDBMSs have proven remarkably extensible, for certain classes of applications (e.g., ultra-low latency trading) and databases (e.g., managing tens to hundreds of terabytes of XML documents), they are simply not appropriate.

As it turns out, I'm not the only person who sees this problem. Michael Stonebraker, noted computer science professor (formerly of UC Berkeley and now of MIT), serial entrepreneur (a founder of Ingres, Illustra, Cohera, Streambase, and Vertica), and general database visionary, thinks the same thing.

Towards that end, he co-authored of two papers:
  • One Size Fits All: An Idea Whose Time Has Come and Gone. This paper makes the argument that the relational database cannot be extended ad infinitum, demonstrates how RDBMSs are inappropriate for several new applications, and argues that the DBMS market will fragment into a series of special-purpose engines, perhaps unified by a common front-end parser.
  • One Size Fits All: Part 2, Benchmarking Results. This paper buttresses the first with benchmark results for relational vs. special-purpose databases in several applications. Interestingly and pragmatically, Stonebraker argues that most people won't even consider a special-purpose database (largely due to inertia) unless it is at least 10x faster than relational for a given application. He then demonstrates several applications where you can see 10 - 100x gains in performance. (Large text and XML contentbases are one the cases he discusses, citing Google's creation of their own file system and software stack to deal with Internet-scale documentbases.)
I have always found Stonebraker's work very clear; he's one of the few authors of academic computer science literature whose work I can always read and understand. Take a look at the articles.

If you're not up for the papers, then here's an interview in Red Hat Magazine that hits many of the key points. (But bear in mind he's doing PR for Vertica here, so the examples are a bit biased towards column-orientation, and I'm sure the webinar mentioned at the bottom is a Vertica one.)

Monday, July 16, 2007

Mark Logic Closes $15M Third-Round Financing

I'm pleased to announce that Mark Logic has closed a $15M third-round financing, lead by Sequoia Capital with participation from Lehman Brothers.

The official press release is here.

As mentioned in the press release, this means that Mark Logic now has plenty of fuel in the tank to fund its future growth, specifically:
  • To expand our presence in our existing core segments of media (aka publishing) and government
  • To expand both domestic and international distribution channels (did you know we have created a UK subsidiary and already have 4 staff on board?)
  • To develop new vertical markets including financial services and life sciences
I'm coming up on 3 years at Mark Logic and when you compare our expected sales this quarter (3Q07) to the quarter I joined (3Q04), we will have multiplied the company 8-fold -- that's a 100% compound annual growth rate (CAGR) during that time period.

I'd like to thank Mark Logic customers for their business, belief, and support and to thank all Mark Logic employees for their contribution to this success.

I'd like to thank our investors for participating in this financing and say that it has been a pleasure to be associated both with Sequoia Capital (which has been behind so many of Silicon Valley's great successes, and whose companies now account for something like 10% of the value of the NASDAQ) and with Lehman Brothers which has provided support in many ways, both financial and operational.

Finally, I'd like to thank Bob Clarkson and the team at Jones Day who supported us in closing this transaction.

Another Excellent Cagle Article

Kurt Cagle has written a second excellent article on XQuery for XML.com. This article, entitled XQuery and Data Abstraction, begins with an excellent alternative history of databases where Cagle asks the question: what would have happened if distributed programming and XML had become established prior to the development of relational databases? What would the mainstream DBMS world look like then?

(At Mark Logic, we think it would look a lot like Mark Logic with a join optimizer.)

He then goes on to discuss the benefits of database abstraction, brought to you in part by SQL but also by the relational model itself, specifically:

  • Portability across vendors, and freedom from vendor lock-in
  • Knowledge universality -- i.e., employee (e.g., DBA) portability across systems from different vendors. While a SQL Server DBA is not interchangeable with an Oracle one, there is certainly more commonality than, say, with an IMS DBA.
  • Heterogeneity, the ability to create an abstraction layer across semi-similar underlying DBMSs.
  • Application decoupling. This was an explicit goal of relational databases from early on: that data could be modeled independently of applications that require it, and that such application-neutrality should exist.
Says Cagle:

For its time, SQL was a profound success because, while it didn't completely tame the database world (it is, of course, in the interest of the vendor to lock in developers to its particular product, at least in the short run). It managed to create a near universal standard for working with relational databases. This helped to launch relational databases as the preferred form for data storage for a couple of decades.

I'd argue that three factors were key to the rise of relational databases: (1) application independence, (2) the SQL standard, and (3) the ability to answer any query without having to know up-front which queries would be asked (which basically enabled the BI industry).

Cagle continues:
However, even as SQL-based databases became the norm, the realization arose that there are, in fact, other ways of describing data, each of which are more suited for particular domains of data.

[...]

Finally, of course, there's the XML explosion and the rise of XML databases. An XML database differs from a relational one in a few key respects:

  • Semi-structured data [...]
  • Externally defined schemas [...]
  • Multiple namespaces [...]
  • Weakness of keys [...]
  • Element multiplicity [...]
We agree fully that XML data brings a huge number of new data modeling challenges for which relational databases were never designed, such as semi- and unstructured data, or handling self-describing data with external schemas. I add that Kurt perhaps misses one of the more obvious problems when it comes to structure: XML is hierarchical and modeling hierarchy has always been the Achilles' Heel of the RDBMS. That alone is not a super-compelling argument for XML DBMSs (I think Cagle's points are stronger), but I think it should be mentioned in the discussion.

Cagle continues:
This has some interesting implications. First of all, it's not really possible to query XML with SQL, because SQL relies on the presence of explicitly defined primary and foreign keys in order to retrieve a set of records in all but the simplest of cases, and for the most part these don't exist in XML.

I'd agree but note that queries can run as long as the keys exist -- they will go very slowly, however, if the keys aren't suitably indexed. But if the DBMS does not know that "boss" in one table is the same as "manager" in another, then these relationships will never be found and the DBMS will not recognize any primary/foreign key match. XML content servers, because of their text orientation, are much better in these situations where a thesaurus is effectively needed to identify relationships in data.

He then continues into a fairly deep discussion of XPath 2.0, some of which is over my head, and concludes:
XPath 2.0 and XQuery, on the other hand, have the potential to abstract not only beyond a single SQL database implementation but beyond SQL itself [...]

However, a surprisingly large number of cases exist where what is important is that the data be accessible in a way that is most transparent to the overall development process, regardless of the container of that data. In those cases, I see XQuery gaining a huge mindshare.
Curt's first XQuery article, XQuery: The Server Language, is here.

Monday, July 09, 2007

Thoughts on the NetSuite S-1

I took a half hour today to skim through the NetSuite S-1, and read a few related blog posts. Here are some quick takeaways and impressions:
  • An impressive recent revenue trajectory from $17.7M in 2004, $36.4M in 2005, and $67.2M in 2006.
  • An even more impressive recent loss trajectory of ($28.6M), ($38.2M), and ($23.4M) in the same three years.
  • An improving, yet still negative, return on sales from -163% in 2004 to -35% in 2006 to -16% in 1Q07.
  • Decelerating revenue growth of 107% in 2005 to 85% in 2006 to 72% to 1Q07.
  • A poor incremental cost leverage in 2005 of 0.68 (i.e., for each dollar of incremental cost in 2005, they got $0.68 in incremental revenue). This happily improved to 1.91 in 2006, yet was 1.41 in 1Q07 compared to 1Q06. In theory, a company should get more than $1 of incremental revenue for each $1 of incremental cost. And if an unprofitable company wants to become profitable, it must do so.
  • 1B+ shares fully diluted shares outstanding. This implies (and the prospectus clearly assumes) an N-for-1 reverse stock split. Such reverse splits generally are bad for employee morale as pre-IPO employees tend to multiply their shares by $10-$20 to get an expected (optimistic) future value. If they're surprised by a reverse split, they can see their anticipated windfall decimated.
  • 74% of the company is beneficially owned by Larry Ellison (61%) and two trusts for Ellison's children (13%). This likely presents some interesting conflicts of interest.
  • The staff is well paid with target CEO 2007 compensation of $600K for running what will likely be an unprofitable or marginally profitable $120M company (size and profitability are my guesses based on simple extrapolation). Three of the five named executive officers made over $400K (and one made over $600K) in 2006.


Recall I am not a financial analyst or adviser, and I don't make recommendations on stocks. I am simply a Silicon Valley businessperson sharing my impressions on reading the document.

For more analysis of the S-1, see this post.

Postini Snatched Off IPO Track By Google

This just hit the wires this morning at 8:00 AM Eastern: Google to buy Postini for $625M in cash. Here's the official press release.

Google will add Postini's on-demand "security" (nee anti-spam) offering to its stable of Google Apps. Judging by the fact that Dave Girouard, VP and general manager of Google Enterprise, is the spokesperson in this Register story, and given the nature of the following quote, it seems clear that Google is attempting to build a base in small and medium business (SMB) on-demand productivity apps and then work their way up into the enterprise in the much same way that Salesforce.com did with on-demand sales automation apps.

Girouard's quote:
The response to Google Apps has been tremendous, with more than 1,000 small businesses signing up for the service every day. At the same time, large businesses have been reluctant to move to hosted applications due to issues of security and corporate compliance. By adding Postini products to Google's technology, businesses no longer have to choose," said Dave Girouard, VP and general manager of Google Enterprise.
My guess is that Postini was tracking towards an IPO within the next 6-12 months, which suggests that they were doing somewhere between $80 and $100M in revenue this year (my take on the new IPO entry bar). This suggests a valuation somewhere between 6-8x sales. Salesforce.com, the king of the on-demand sector, trades for 9x TTM revenues (see here), and the industry and sector trade for 5-6x revenues.

Some simple math (assuming 40% growth and 40%/60% 1H/2H revenue linearity -- both are standard assumptions and neither of which I have any specific reason to believe true) says that current year revenue is 1.2x TTM revenue, suggesting a 7.5 to 9.5x valuation for Postini, in-line with the premium you'd expect to pay for snatching the IPO dream from a company that could see it within reach.

Here's the Google Blog commentary on the deal. Here's their FAQ. Finally, here's the Google Enterprise Blog commentary as well.

Thursday, July 05, 2007

Autonomy + Zantaz = Zantac?

I couldn't help but chuckle at the name of the firm Autonomy bought this week: Zantaz, a provider of e-discovery and archiving solutions, based in Pleasanton, California. Who knows, when they're done digesting their $500M Verity and now $375M Zantaz acquisitions, they might need some Zantac after all.

First, I'd guess that Autonomy has to be pretty frustrated when they've spent nearly $1B on acquisitions to try and build critical mass as an independent software supplier and a Forrester analyst comes up with the following in this InfoWorld story about the deal:
The combined Autonomy-Zantaz could be an attractive acquisition target for a larger infrastructure software vendor such as Oracle, IBM, or EMC, said Barry Murphy, principal analyst at Forrester Research.
Ouch. Bad day in the PR department, I'm sure.

Second, in the credit-where-due department, I think Autonomy has done a better job with the Verity acquisition and integration than I'd predicted. And, relative to certain other vendors in enterprise search (based on my reading of vendor financials and associated analyst reports), Autonomy appears to run a pretty tight financial ship.

So what is this acquisition about and why do they need it? I think it's about three things:
  • Size. Autonomy is buying a $100M revenue stream that I'd guess (given the 3.75x sales multiple) is growing at least 50%. Even with the Verity acquisition, Autonomy, the #2 enterprise search vendor, did $65.5M in revenues in their most recent quarter while the #3 vendor, Fast Search, did $49.2M. By my math, when you combine their current sizes with their instantaneous growth rates (17% for Autonomy and 50% for Fast), Fast catches Autonomy in about a year. Simply put, Fast is probably seen as too close for Autonomy's comfort. This deal puts another $100M in revenue between them.
(I've read that Google did $350M in enterprise search in the past year which should make them the #1 overall vendor. See my repeated posts on what I call the the enterprise search "rock and a hard place" dilemma.)

  • E-discovery. Clearly, e-discovery and the new, revised Federal Rules for Civil Procedure (FRCP) will provide a large market opportunity for vendors of search-related technologies. However, just about every ECM, search, e-discovery, and storage vendor is trying to create an story. So they won't at all be alone in the space.
  • Platform. Every search vendor wants to be more than a search engine with ornaments. They all want to be positioned as an application platform. But the question is: what is the right application platform for e-discovery and compliance applications? Is it (1) a gussied-up last-generation search engine, (2) an email intelligence platform where e-discovery is but one application, as Aaref Hilaly argues in his excellent e-discovery 2.0 blog, or (3) a general-purpose content applications platform (like MarkLogic) where email intelligence is but one of a suite of applications?
Obviously, at Mark Logic we're believers in case 3 and are thus actively seeking VARs and application providers who feel similarly.

Celebrating XML Independence

Today, I'd like to highlight a (4th of July holiday) post on Matt Turner's Discovering XQuery blog. Matt's post refers to this article, entitled XQuery: The Server Language, on XML.com, written by Kurt Cagle.

I'd read Kurt's article when it was posted on June 6 and had meant to blog on it, but didn't get around to it (or frankly, much blogging at all) during the busy month of June. Nevertheless, here are few chunky morsels from Kurt's article:
As an XML developer, one of the problems that I come across almost invariably within these [server-side scripting] languages is the fact that they are shaped by people who view XML as something of an afterthought, a small subset of the overall language that's intended to satisfy those strange people who think in angle brackets.
He then shows an example (that warmed Matt Turner's heart) of how often people have to create HMTL by composing strings in-line. More morsels:

The original intent of the developers of XQuery was to use it, not surprisingly, as an XML-oriented query language. XQuery is not itself XML based (nor for that matter is XPath), but all of its operations are designed to work with XML documents or XML databases to provide a way of filtering or manipulating that XML to produce some form of output, most typically as XML or HTML.

Intriguingly, as a filter on XML, XQuery has seen only limited success. Part of this has to do with the fact that a significant number of the databases currently in use are SQL based, not XML based, so the benefits to gained by using an XML query filter are offset by the need to convert relational data into XML in the first place.

While I'd agree with Kurt thus far on the market adoption of XQuery and the hassle introduced by having to map XML to an RDBMS (see this post on Top-to-Bottom XML Apps), we at Mark Logic like to think of ourselves as the exception to the slow XQuery adoption rule. While XQuery is not a huge wind at our back, we have been able to grow the company eight-fold since I joined in 3Q04 and that growth is most definitely helped by the de-risking that comes with XQuery by virtue of it being both an industry standard and an eventual, inexorable replacement for SQL.

(If green is the new black, then XQuery is the new SQL, and SQL the new COBOL.)

Kurt concludes his article with:
This article serves as a very basic introduction to XQuery as a server language. I will be addressing this topic in more detail in subsequent articles in this series, examining some of the more sophisticated capabilities and the gotchas inherent in working with XQuery and eXist, and showing what explosive power you can release when you combine eXist or other rest based XQuery engines with XForms and Ajax.

My prediction is that REST based XML databases like eXist will seriously challenge the existing raft of server languages, from ASP to Ruby, within the next couple of years. Right now, it's something of a closed secret among a few developers, but the power, sophistication and ease of use inherent in working with the XML as if it were a natural part of the server landscape can only be understood by trying it.


I couldn't agree more with the bolded statement and we all look forward to seeing the subsequent articles in the series.