Tuesday, August 26, 2008

Put Your Money Where Your Metadata Is

Just a quick post to highlight a -- dare I say "catchy" -- webinar that we're running with Folio magazine and Really Strategies on September 10, 2008 at 2:00 PM EDT.

Speakers include:
To register, go here.

Monday, August 25, 2008

Forrester Trends Report for CIOs

I found this new Forrester report by analyst Bobby Cameron entitled The Emerging Technology Trends that CIOs Should Care About via this post on the ReadWriteWeb.

The six trends are:
  • Technology populism
  • The information workplace
  • Dynamic business applications
  • Digital business architecture
  • IT ecosytems
  • Enterprise master data management
While several are relevant to Mark Logic, I think the most relevant is information workplace. Here's what the ReadWriteWeb had to say about it:
The information workplace is a term describing a next-gen platform that consists of numerous parts such as unified communications, portals, enterprise content management apps, office productivity apps, collaborative technologies, business intelligence, data warehousing, and more. However, the information workplace isn't about each of these technologies individually, but how they all seamlessly come together as a whole. Today's information workplace is role-based, individualized, and thanks to the Web 2.0 invasion, it's also often "social" and "quick," as Web 2.0 tools tend to be.
We think we're a great platform underlie these sorts of applications. Keep an eye on our integration with Office 2007, Adobe, and Microsoft SharePoint going forward.

MarkLogic Named Trend-Setting Product by KMWorld

MarkLogic has been named a trend-setting product for 2008 by KMWorld. Says Hugh McKellar, editor in chief, of KMWorld:
This year's KMWorld list of Trend-Setting Products has been compiled through briefings with vendors themselves, along with conversations with analysts, users and system integrators about products that represent the best research and development of solutions for nearly every organization and industry.

Our mission in choosing this year's products has been deceptively simple: select those that deliver robust customer value.
We're honored to have been named to the list.

Andy Feit Connects at Search Engine Strategies

Mark Logic's marketing chief, Andy Feit, took the stage this week at the Search Engine Strategies conference in San Jose. A self-described "information retrieval guy from way back," Andy participated on a panel on enterprise search with Bill French from MyST, Avi Rappoport from SearchTools.com, and Rebecca Thompson from Vivisimo.

Stephen Arnold, of the Beyond Search blog, covered the session in a post entitled Search Points of View: One Home Run, Three Singles. Hint: Andy wasn't a single.

Excerpt:
For me, the bulk of the information in this summary was of interest because it makes clear the difficulty of discussing search, content processing, and text analytics without a clear definition and scope to bound the remarks. My thought is, “Give MarkLogic more time on the next panel.”
The session itself is blogged about on the official conference site here. While Andy's presentation was only ten minutes (followed by group discussion), here are the slides he used as a brief introduction to the panel.

Andy Feit at SES
View SlideShare presentation or Upload your own. (tags: xml search)

Friday, August 22, 2008

The Specialized Database Argument: Performance

People sometimes ask: what's the argument for special-purpose databases like MarkLogic, as opposed to general-purpose databases like DB2, Oracle, or SQL Server? While I have written much on this topic, in the end I think it boils down to one word: performance.

The big 3 database oligopoly have proven that the general-purpose database management system (DBMS) can indeed be bloated into a wide scope of functionality (today's RDBMSs are so bloated that most analysts now drop the R, because they've long-since stopped being relational).

So while the big 3 can bloat the DBMS, what they can't do is optimize it for each special case. By definition, the general-purpose DBMS needs to be optimized for general purposes. When trade-offs are encountered, you must design for the general case.

That's what creates the opening for specialized DBMSs. For example, MarkLogic is not optimized for the general case -- a bit of transaction processing, a bit of data warehousing, a bit of analytics, a bit of text, a bit of XML, a bit of spatial indexing, a bit of data mining, a bit of huge deployments, a bit of tiny ones, a bit of OLAP, a bit of memory-residency, and so on.

MarkLogic is optimized for the specific case of large amounts of semi-structured XML data, typically containing lots of text. The result: performance numbers that simply crush the competition when they're playing in our house.

For example, while I can't go into specifics, one of our technical staff sent an email out this morning that went like:
Another 100x Win Against XXXXX

Today, I indexed XML in 137 seconds which took XXXXX 4 hours, even though they were running on beefier hardware. Due to other pressing deadlines [and the already clear victory], I didn't have time to optimize the MarkLogic side. Had I been able to do threading and cache tuning, I'm quite sure I could have sped up the MarkLogic side by 4x.
Is this magic? No.

While I think the world of our engineering team and I do believe they have built a tremendous product, there's no magic. It's simply the combination of a great implementation focused on a specific XML-based use case. No general-purpose player can beat that.

Thursday, August 21, 2008

Wordle is Too Much Fun: Tagcloud for the Mark Logic CEO Blog

After seeing Really Strategies make a user conference tagcloud (more precisely, a wordcloud, as it's built from words, not tags) in response to my post where I showed the Seedcamp wordclouds, some of the guys inside Mark Logic decided to make a wordcloud for this blog.

After seeing the cloud they built for my blog, I decided to try Wordle myself and, boy is it fun!

Here's my wordcloud for this blog:


To which one of my salesfolks responded, "software, revenue, growth" -- what a shock.

Really Strategies Zeitgeist

Inspired by my most recent post, Really Strategies, made a tagcloud for their RSuite CMS user conference. Here's their blog about it. And here's the tagcloud.

Wednesday, August 20, 2008

Startup Zeitgeist

Seedcamp, a London-based, week-long camp for European entrepreneurs recently did an interesting exercise. They took the several hundred applications they received for their event and made tagclouds. Here's what they found.

What are you creating?


How will you make money?


What tools will you use?



(I'd love to see XQuery in the toolset, but happy to see that database, server, and XML are already there.)

And who says you can't do interesting analytics on content? I thought this was fascinating. Check out Seedcamp's blog post about the exercise, here.

Friday, August 15, 2008

Nstein 2Q08: Organic Growth Slows; The Moldy Sandwich Argument

Nstein, a Canadian text mining vendor, who -- through a combination of acquisition (e.g., Eurocortex, Picdar) and licensing (TextML) -- has re-invented itself as a provider of solutions for newspapers and magazines, announced its 2Q08 results yesterday.

(All figures CAD$)
  • Revenue of $6.0M
  • Slight decrease in sequential revenue from $6.03M in 1Q08. This is unusual from 1Q to 2Q for a software company.
  • Year over year revenue growth of 50% compared to $4.1 in 2Q07. This looks good, but is half inorganic (see below).
  • EBITDA of -$1.1M, up from -$0.5M in 1Q08
  • Net loss of $1.6M, up from $0.9M in 1Q08
  • Cash burn of $1.8M
  • Ending cash of $7.7M
License revenues were lower than expected, resulting in increasing EBITDA and net losses:
"This setback is explained by a lower level of licensing revenues than the Company had anticipated, since some customers delayed their decision to purchase."
Note that the "slipped deals" card is one that should be played carefully because, technically speaking, if deals really slipped and everything else is on track, then analysts should add the value of the slipped 2Q08 deals to their existing 3Q08 estimates. But I wouldn't count on it.

On the surface, revenue growth looks good, until you see that half was inorganic:
"Equal credit for this growth can be attributed to the acquisition of Picdar in February 2008 and the organic increase in the Company's revenues."
That means 2Q08 organic growth was 25%, down from 59% in 1Q08 and 61% in 4Q07, assuming those figures are purely organic which, as far as I can tell in ten minutes, they are. (Once an acquisition has been on the books for 5 quarters it no longer provides any "purchased growth" effect because both the year-ago period and the current period include the acquiree's revenue.)

My primary beef with Nstein is they appear to sell a lot of services for a company whose value proposition is all about pre-integration. After all, if you wanted an integrated XML server, CMS, and DAM system, you could integrate MarkLogic, Documentum, and Artesia and have a lovely best-of-breed strategy.

Nstein's argument is that that's too expensive to buy and integrate. So, instead, you should use their pre-integration of three products from companies that they have, in cases, acquired for less than a single large Documentum license deal (e.g., Eurocortex cost $1.4M).

Overall, I call this the "moldy sandwich" strategy. If customers want sandwiches:
  • You can get crisp bread, fresh ham, and home-made mayonnaise, and ask your customers to do the assembly
  • Or, you can get moldy bread, old ham, and sour mayonnaise -- but pre-assemble the sandwiches
Putting component quality aside, vendors following the second strategy should require relatively few services to deploy their products because the sandwich is, after all, already built.

Now Nstein doesn't break out services revenues, but by knowing typical software vendor gross margins by line of business (i.e., software, maintenance, consulting) and doing some reverse engineering, you can build a model to guestimate what things look like. The big clue is the 52% overall gross margin, quite low for a software company (norm 80% to 85%), which suggests there is a large amount of relatively low-margin consulting business. See below:


Note that this is just one way to make the figures work, but the simple problem is that software gross margins are so high that there's simply not room for much software revenue in the overall mix, and it's hard to drop maintenance (also typically high margin) below 20%. Even if I model consulting as losing money, it's hard to get license above 40%.

What does this mean? If you're selling pre-assembled sandwiches, then why are you selling $3.3M in services per quarter? Could it be they aren't as pre-assembled as they're pitching?

And if they aren't, then what's the value proposition again?

Freebase Parallax: Yummy UI ... But One Query

The unfortunately named and well funded Freebase, about whom I've previously blogged here, recently released a nice UI called Parallax. It's the closest thing I've seen to a "BI-style" navigational query interface against "unstructured" data, and worth a look.

Note:
  • I quoted "BI-style" to remind everyone that search vendors didn't invent multi-dimensional navigation as they seem to believe. The whole notion of facets (i.e., dimensions) goes back to OLAP vendors and to the EIS vendors who preceded them. Faceted navigation is a wonderful thing to do against content (and MarkLogic features it), but let's remember who is immitating whom. Put differently, Endeca is PowerPlay for paragraphs.
  • I quoted "unstructured" because, despite appearing to have document entries, Freebase is anything but unstructured. It's a dictionary-driven, metadata-driven Wikipedia. Concrete example: when you're dealing with the San Carlos airport entry, Freebase "knows" about airports, for example, that they have runaways, which in turn have lengths, widths, and orientations.
The video does a great job of pointing out the difficulty in answering a basic question ("which schools did the children of Republican Presidents attend") using either Google or Wikipedia. It also shows that the question can be answered navigationally rather easily using Parallax. That's nice.

But consider this: in a reasonably well structured XML database, you can answer that question in a single XQuery. (See below the video.) Now, I'm not trashing sexy UIs or the need for end-user tools to query data. (Given my 9 years at Business Objects, I'd be the last person to do that.)

But I am saying that delivering end-user analytics requires two things: a nice front-end with a powerful DBMS behind it. And if data/content source for those analytics is, or should be, XML, then you know which DBMS I'd recommend.



Freebase Parallax: A new way to browse and explore data from David Huynh on Vimeo.

And here's the single XQuery, courtesy of Kelly Stirman:

let $presidents := /person[@type="US President"]/name
for $child in /person
where $child/relationship/@type="parent" and $child/relationship/name=$presidents
order by $child/date-of-birth ascending
return
<child>
<name>{data($child/name)}</name>
<dob>{data($child/date-of-birth)}</dob>
<schools>{$child//schools-attended/school-name}</schools>
</child>

Category 5 Hurricane Hits PR

PR, as we know it, is dying. Most of the change is second order, driven by turmoil in the underlying media markets that PR influences: digitization and its effects, the slow death of newspapers, and total upheaval in trade media, also known as B2B.

Here's an excerpt from a prior post (on a different topic) where I wandered into a discussion of the IT trade press:
Beware the fate of the B2B computer trade press.

Computerworld, InfoWorld, InformationWeek, Transform, Intelligent Enterprise, PC Week, Network World, DBMS, Red Herring, the list goes on and on. I read them every day for years. I had piles of them on my desk. We laughed when we got great customer stories and we cried when the lab panned our new product. But the magazines were everywhere. They were an integral part of IT life.

Now, seemingly in an instant, they're all gone. ... Why? Because they didn't add enough value.

I'm not sure how it evolved over time ... , but by the time the Internet was posing a huge threat ... , most of the IT trade press had degenerated to the following formula:
  • Hire 20-something English majors as IT trade journalists
  • Have them filter vendor press releases ...
  • Write stories based on the press releases, one live analyst interview, and two customer interviews
  • Make money by selling advertising to the vendors
  • Don't rock the journalistic boat too much ...
[...] Some say the Internet wiped out the IT trade press. I think the IT trade press wiped out the IT press. They catered too much to vendors. They cut costs and value commensurately.

And they found themselves pretty much out of business, ironically replaced by vendor press releases (which at least you know are vendor-biased), bloggers (who weren't afraid to call it like they saw it), industry analysts, and a few hybrids ...
Today's post was inspired by this post in TechCrunch, The PR Roadblock on the Road to Blissful Blogging. Excerpt:
The issue Rubel brings up is whether PR really serves any purpose today given that more and more journalists, particularly tech journalists, are finding the interesting stuff on their own and ignoring the canned pitches that hit their inbox daily.

I can’t speak for big media journalists who’ve been in the game for years and years, but from my experience with blogging for a few years, I agree that PR as a profession is broken.

Rubel refers to Edelman's Steve Rubel whose post, Does the Thrill of the Chase Make PR Obsolete, presumably provoked the TechCrunch piece. Rubel's post, in turn, seems provoked by this one on Scobleizer, PR-less Launch Kicks Off Stack of Overflow Praise.

The basic theme here is that bloggers in particular hate to get pitched by PR firms and that their favorite way to find something new is to hear about it from a disinterested techie and then poll Twitter or the blogosphere to see if people like it.

Says Rubel:

We [the PR community] have to stop spamming people and make sure that companies and products are easy and a joy to discover. That's no easy feat. Further, it means giving up control. However, in a Google age where self-discovery rules, it's becoming a must.

But, it's about more than control. It's about knowledge. PR firms historically made money by renting freshly minted English majors for $150/hour and that's a hard habit to give up. But, in the future it's going to be about value-add, not just legwork. It's going to be about understanding the story and adding value to it. And it's going to be about relationships. Says TechCrunch:
Suddenly you are no longer just a spectator with an agenda. You are now part of a community. You are a person that gives and takes. Someone who makes the overall network stronger. And I guarantee that after a few weeks of actually participating in the community, you’ll have far better press connections than most of the PR people we deal with daily.

Thursday, August 14, 2008

Krugman on the Grateful Dead as a Business Model

Back in June, Paul Krugman wrote a nice op-ed piece in the New York Times entitled Bits, Bands, and Books which looks at the changes in the information and media business (e.g., publishing, music) and compares them to what I call the Grateful Dead business model.

Having been to, shall we say "more than one," Grateful Dead concert, I've always believed the Dead were the role model for Web 2.0. Consider the business model:
  • Give away the (digital) product. Encourage live taping (bootlegging) and tape sharing. I've been at shows where they stopped and waited until someone moved their microphones so they could get a better recording.
  • Make money by selling concert tickets. To my knowledge they made more money touring than any band in history.
  • Make money by selling paraphernalia (in the sense of t-shirts and such)
  • Build a strong community. Need I say more?
So, all the while the music industry was freaking out over the copy-ability of digital media, I kept asking myself -- why doesn't anyone study the Dead? (And, yes, part of the answer is that all those concerts were hard work compared to replicating albums or CDs.)

As a business-oriented Dead fan, I'd always thought this. I was just happy to find someone, er, respectable, who thought the same thing. You can read Krugman's piece, here.

Norm Walsh to Keynote RSuite User Conference

Last week, Really Strategies announced that Mark Logic principal technologist Norm Walsh will be the keynote speaker at the annual RSuite CMS user conference to be held October 14-15, 2008 in Philadelphia.

RSuite CMS is a MarkLogic-based content management system built for professional publishers, used by customers such as Sage, Blood-Horse, CQ Press, and Audible.com.

Gary Cosimini from Adobe will also be a featured speaker at the event.

Wednesday, August 13, 2008

Hey, I Made Alltop!

I'm pleased to announce that the Mark Logic CEO Blog is now featured in the content topic of Alltop, a topical web aggregation site created by Guy Kawasaki (an epic Silicon Valley marketer, author, and founder of Garage.com), Will Mayall, and Kathryn Henkens.

Here's what they say about how they select sites and blogs:
We use a patent-pending, semantic computational algorithm derived from the post-doctoral work of Guy at Stanford. Just kidding. We rely on several sources: results of Google searches, review of the sites’ and blogs’ content, researchers, and our “gut” plus the recommendations of the Twitter community, owners of the sites and blogs, and people who care enough to write to us. Let us declare something: The Twitter community has been the single biggest factor in the quality of Alltop. Without this group of mavens and connectors, Alltop would not be what it is today.
Thanks for picking me. In return, I've added an Alltop button to my sidebar. If you're interested in learning more about Alltop, go here.

TigerLogic 1Q09 Results

Yesterday TigerLogic Corporation (previously Raining Data Corp.) announced its first-quarter fiscal year 2009 (1Q09) results, for the period ended 6/30/08.

Highlights (figures rounded to one decimal place):
  • Revenue of $4.6M, down 8% from $5.0M in 1Q08
  • License revenue of $1.6M, down 18% from $2.0M in 1Q08
  • Operating loss of $1.5M, up from a loss of $0.1M in 1Q08
  • Net loss per share of $0.06, up from a loss of $0.01 in 1Q08
  • Cash burn of $2.0M
  • Ending cash of $12.0M
Per Google Finance, the company's market cap is $132M, more than 7x run-rate revenues. This strikes me as pretty hefty, given that software norms run in the 2-4x range, and the "typical" software company is growing, not shrinking, and has a modestly positive return on sales. The stock has been hovering around $5 for the past year.

Since the company also sells (and as you'll see in a minute, I use the term loosely) an XML database system, I thought I'd take a dig around the 10-Q.

Here's what I found on page 11, in the section on products.
TigerLogic® XDMS is a high performance, scalable, enterprise native XML database management server with both data- and document-centric capabilities. The TigerLogic XDMS difference comes from its core technology, a highly flexible data model that is optimal for managing and storing any kind of XML or non-XML data and its high performance, extensible XQuery Engine.
A lot of adjectives, but sounds good. I'm skeptical of the "dessert topping and floor wax" nature of the product claims -- i.e., both data and documents and XML and non-XML data. But let's continue:
TigerLogic XDMS provides a level of efficient persistence that XML applications and transactions require, offering the benefits of roles-based security, XA-compliant transactions, replication and high-availability for enhanced reliability. TigerLogic XDMS provides the benefits of an enterprise-scalable system that allows on the fly changes to content, recursion, and automatically optimizes storage.
They're spewing features like a Bronx fire hydrant on a 100 degree day. I particularly enjoy the attempt to switch to benefits in the second sentence, only to immediately degenerate back to features. (Note to marketers: recursion isn't a benefit.) Let's continue nevertheless:
TigerLogic XDMS supports an extensible and flexible development and deployment environment. Unlike other XML data management alternatives, TigerLogic XDMS does not need to know the schema or structure of data before processing and storing it.
Regarding up-front schema knowledge, we don't need it either.
We believe the ability to make XML schemas optional is a vital innovation because the structures of operational systems frequently change, and mapping schemas for the purpose of linking to a new data source is both difficult and time-consuming. The system also enables support for schema versioning, which is critical when addressing evolving standards and XML schemas.
We agree that it's a major innovation for a DBMS to not require advance schema knowledge and/or schema adherence for data. It's actually quite au contraire from normal database systems. The norm is: (1) tell the DBMS what the data looks like, (2) feed in 10M instances, (3) build indexes to match your anticipated queries, and (4) then run queries.

In MarkLogic, the process is (1) load 10M instances even if they vary in schema and you've not told the system their structure in advance, and (2) then run queries. To me, it's the "right" XML way of doing things, because XML is, after all, supposed to be self-describing.

But I digress. Let's continue reading:
The General Availability Release of TigerLogic XDMS version 2.6, which included support for enhanced XQuery features, including XQuery stored procedures and full-text search and support for high availability clustering, was released in July 2006. Version 3.0, which is the third generation release of the product and includes compliance with the XML Query 1.0 specification, released in January 2007, cache management of data sources, in-memory cache, support for geospatial data, enhanced application programming interfaces (“APIs”) and data replication was released for beta testing in June 2007.
All very wordy and impressive.
To date, our revenue from TigerLogic XDMS has been less than $300,000
Whoa. Hang on. Did they really say inception-to-date revenues are less than $300K? Well, other than the, uh, revenues the new product line's doing just fine. Right.

Other than that, Mrs. Lincoln, how did you like the play?

Tuesday, August 12, 2008

Microsoft Kickoff Videos: The Good, The Bad, and The Ugly

My guess is you'd need be under 25 to have missed the Dean Scream of Microsoft videos, this gem of a few years ago featuring a bouncing Steve Balmer at what I'm guessing is an annual kickoff.



Now, say what you will, but I'll take genuine enthusiasm on the part of a slightly crazed, middle-aged, balding executive (it takes one to know one) over faux corporate fun.

For an extreme dose of the such faux fun, check out this video, entitled Rockin' our Sales by Bruce ServicePack and the Vista Street Band. Presumably the video was made by Microsoft's marketing department -- at I'd guess a cost of at least $100K -- to "fire up the troops" at some sales kickoff for Vista.





It's so weak that it was blogged in a Wall Street Journal post entitled Microsoft's Cheesy Video to Sell Vista.

But, in a surprisingly turn, Microsoft then claimed that it was never intended to be funny. It was, they said, a spoof, an attempt to make fun of themselves. Frankly, I have trouble believing it, but Charles Cooper of CNet makes the case here. Even if it is true -- even if they deliberately made a horrific video in order to draw lameness cries only to say "fooled you, it was supposed to bad" -- then all I can say is what the heck were they trying to accomplish?

Either way, it's lame and a huge waste of resources. The only question is whether they aggravated the first assault by deliberately setting out to waste my time with the second.

Now, having done a few song re-writes and participated in a few skits of my own, some might cry foul. Which, in turn, raises the question of what makes some internal corporate videos so fun and what make others so lame. To me, to make it work, the sketch must:

  • Play off one or more real internal issues
  • Include a degree of irreverence
  • Generally be performed by company executives and not paid performers
  • Not be a transparent attempt at motivation
  • At least make an earnest attempt at humor
And finally, take it easy on the budget. Managers squeezed to their last $50K on the final budget pass in December don't like to see $100K vaporized by marketing before their very eyes a few weeks later.

Mark Logic Featured in Folio Article

Mark Logic was recently featured in a Folio Magazine article, Making the Case for XML Repositories by Jason Fell.

The story discusses MarkLogic customers Reed Business Netherlands and Blood-Horse Publications (in these sense of bloodline, i.e., lineage).

Excerpts:
We chose [the MarkLogic-based] RSuite CMS because we had a very tight timeframe to convert our data feed architecture over to XML,” says Luther Andal, Blood-Horse’s director of technology. “Automated processes that consume, transform and distribute XML have allowed us to reduce staff over the last year while producing nearly the same amount of print products and many news online products and new features for our Web site.

“The ability of the business to rapidly repurpose content into new products for industry events and trends has given us additional revenue streams,” Andal adds. “IT resources have been able to devote their time to developing new products and features instead of having to support systems that have been automated.
Mark Logic's own John Kreisa is also quoted extensively in the article.

Monday, August 11, 2008

The Making of MarkMail Makes Blogs of Note

Congratulations to the MarkMail team! Not only are they doing wonderful things with MarkMail (now with over 20M messages loaded), but their companion blog, The Making of MarkMail, has been named by the Blogger team to the Blogs of Note list. Great stuff!

See their write-up on it, here.

Fun Google Parody Video: Complexity is Good

I stumbled into this video while reading Stephen Arnold's recent post, Google Search Appliance: Showing Some Fangs. In the post, Stephen offers a pretty comprehensive look at the Google search appliance (GSA) prompted, I believe, by a new release that includes features such as personalized search results, alerts, and broader language support.

If you're interested in the new features, see this video here.

If you want to have some fun, check out this video which portrays Google's view of a typical enterprise search software sale, complete with the cheesy salesperson.




As I've repeatedly maintained (e.g., 1, 2, 3, and 4), I think the GSA is going to consume the "crawl and index the intranet" segment of the search market, pushing classical enterprise search vendors up-market, and eventually into an un-winnable conflict with DBMS vendors.

Is Google a Media Company? In a Word, Yes.

The New York Times ran a story today entitled Is Google a Media Company? In my estimation the answer is simple: Yes, Virginia, Google is a media company and in reality always has been.

The Times story is about knols, Google's "unit of knowledge," about which I've previously written in (one of my more cleverly titled posts, if I do say so myself): Google as Publisher, The Grassy Knol.

It all comes down to buttermilk pancakes. When you phrase search "buttermilk pancakes" in Google, hit #3 is a knol by Scott Jenson.

Hey -- wait a minute -- who the hell is Scott Jenson and why is his pancake recipe #3 on Google? The answer: Scott Jenson is "a user interface designer for Google ... avid cook and traveler."

Oh. That explains it.

Now you might like your pancake recipes to come from Google user interface designers (and I'm sure they look pretty), but I have trouble believing that mainstream society wants recipes that way. If they did, I suppose we'd be watching UI designers all day on the Food Network, instead of Bobby Flay, the ever-profane Gordon Ramsay, or Iron Chef Morimoto.

But Scott wrote a knol. And Google can, and probably will, decide to prioritize knols over other content sources, and certainly will, ceteris paribus. Hence the conflict of interest between Google the indexer and Google the content owner. (Do you think it's an accident I run this blog on blogger? One hopes for whatever advantage one can get.)

Where, for example, do we find the venerable Martha Stewart's buttermilk pancake recipe?

She comes up, in total obscurity, at #18. And here's what the co-CEO of her company, Wenda Harris Millard, has to say about it:
Although Martha Stewart’s buttermilk pancake recipe appears lower than the Knol recipe in Google’s rankings, Ms. Millard does not believe that Google unfairly favors pages from Knol. But she said that Google’s dual role as search engine and content site raises an issue of perception. “The question in people’s minds is how unbiased can Google be as it grows and grows and grows,” Ms. Millard said.
I suspect Ms. Millard is so polite because she doesn't want Martha's recipe at #180.

Google, predictably weighs in with claims of neutrality:
“When you see Knol pages rank high, they are there because they have earned their position,” said Gabriel Stricker, a spokesman for Google.
Yes, I'm sure. By the way, John Edwards is faithful, the Chinese gymnasts are all 16, and there really is a Santa Claus.
Google can say they are not in the content business, but if they are paying people and distributing and archiving their work, it is getting harder to make that case,” said Jason Calacanis, the chief executive of Mahalo, a search engine that relies on editors to create pages on a variety of subjects. “They are competing for talent, for advertisers and for users.”
The sooner publishers realize that Google already is a media company and becoming more of one every day, the better. Call knols the smoking gun, or the smoking pancake. But realize the inherent conflict between index neutrality and content ownership. Then consider how businesses over history have managed such conflicts.
“If I am a content provider and I depend upon Google as a mechanism to drive traffic to me, should I fear that they may compete with me in the future?” Professor Yoffie asked. “The answer is absolutely, positively yes.”

Thursday, August 07, 2008

Army's BCKS System Profiled in Government Computer News

GCN today featured a story, Battlefield Knowledge Management, that profiled the US Army's use of MarkLogic in the Battle Command Knowledge System. Excerpt:
Now picture the frustration of executing such a search not over a broadband link in your home or office, but instead over a slow speed link as a solider deployed in a hostile forward area, under pressure and time constraints to gather critical information in preparation for battle.

The Army may have found a solution by implementing a Battle Command Knowledge System (BCKS) to improve soldiers’ abilities to search the Army’s Warrior Knowledge Base (WKB). [...] The system enables soldiers to find the most up-to-date and cutting edge information that may assist them in the field.
The story continues:
The main feature of WKB is its ability to perform fast, specific searches. Rather than returning search results as a laundry list of links to large documents that would have to be downloaded and perused, BCKS returns very granular answers to queries generated by soldiers. The system is populated by Army content managers, who mine Army resources for applicable knowledge to add to the WKB repository. The content managers assign specific attributes (metadata) that characterizes the content and serves as keywords in the searches.
Read the full story here.

Tuesday, August 05, 2008

Norm Learns Rule 1

One of the fun things about Mark Logic is that we unite people from different computing backgrounds: database people, search engine people, content management people, the odd computational linguistics person, and -- of course -- document/XML people.

Aside: one of my big theses of computing life is that individuals tend to stovepipe into a single computing camp early on, fail to cross-breed / cross-read, and thus the camps end up quite in-bred and incommunicado over time. That's one reason why I deliberately "jumped camps" in leaving Business Objects four years ago, hopping from BI into unstructured data / content / documents / XML.

But I digress.

We recently hired Norm Walsh, a pretty big guy in the document camp, which elicited comments such as the following from his fellow camp members:
I’m wondering how in the hell some obscure “XQuery Content” company stole Norm Walsh away from Sun. [...] Anyone care to provide some insight? Is Mark Logic really *that* good?
That was fun.

But what's been even more fun is helping someone who is clearly a distinguished individual in one camp and introducing him to another. Towards that end, I'm happy to report that Norm is now officially certified in what I call rule 1 of database performance: push constraints to data, don't move data to constraints.

Believe it or not, rule 1 appears quite counter-intuitive to document people who seem to innately want to materialize DOM trees and then process them in a middle tier.

Because I'm so wed to the database viewpoint, I have trouble expressing it in a document-person way. That's why I'm happy that Norm has recounted his journey here, in a post entitled Thinking Differently about XML.

Monday, August 04, 2008

The Never-Ending Fast Search Story

I've already spent a lot of space covering the financial issues at Fast Search & Transfer. In part that was because, prior to the Microsoft acquisition, we competed fairly often with Fast, particularly in our publishing practice. Part was because the company reminded me of MicroStrategy, against whom we had to compete at Business Objects. Part was driven by my personal interest in international software companies and the issues that un-level the reporting playing field (e.g., GAAP vs. IFRS reporting).

Anyway, I took a crack at a post earlier today based on a story in a Norwegian business weekly, Dagens Næringsliv, that in turn has prompted posts from CMS Watch to TechCrunch to Stephen Arnold to Curt Monash.

I burned several hours, posted something, got in the car, drove home, and deleted the post just after I arrived. Somehow, despite considerable effort, I couldn't find what I thought was a satisfactory and appropriate way to editorialize.

Ergo, I decided simply to present the story. You can see it by pressing this link or looking at the Scribd iPaper below.

Disclaimers: I don't speak Norwegian and can't attest to the quality of the translation. I don't know either Norwegian culture nor Norwegian business publications so I can't vouch for either the legitimacy of the source publication itself or for any cultural slant present in the story.

Beneath the translated text are images of the original story with Norwegian body copy.


Read this document on Scribd: Fast's Stock Market Bluff

Saturday, August 02, 2008

Thoughts on Category Creation and Information Access Platforms [Revised]

[Revised 8/2/08; still working on cleaning up this consciousness stream.]

Back in the old days, it seemed easy to create a category in software. Look at the database market, for example:
  • IBM invents the relational DBMS (RDBMS) category
  • Oracle, Ingres, and Informix enter in a largely undifferentiated way, though Informix eventually drifts towards the low-end/cheap segment
  • Sybase creates the derivative category of high-performance OLTP RDBMS.
  • Arbor re-christens the failed multi-dimensional DBMS as the OLAP Server
  • Tandem creates the non-stop RDBMS with its superb fault tolerance
  • Illustra launches the universal DBMS and is quickly acquired by Informix
  • Sybase launches the bitmap-indexed DBMS with SybaseIQ
  • Teradata launches the data-warehouse DBMS category
And you can find just as many examples outside database-land.
  • ASK defines the manufacturing resource planning (MRP) category
  • SAP hijacks MRP, redefines it as ERP, and goes on to become the world's largest applications software company
  • PeopleSoft invents the HRMS category
  • Gartner Group's Howard Dresner invents the business intelligence (BI) category, re-christening and re-framing what was formally known as DSS or EIS.
  • Siebel pioneers the sales force automation (SFA) category
  • Scopus pioneers call center automation (CCA)
  • Companies like Rubric pioneer enterprise marketing automation (EMA)
  • Siebel, through acquisition, coalesces SFA, CCA, and EMA into a single category called customer relationship management (CRM)
  • Oracle and SAP work to coalesce CRM back into ERP. Such is the ebb and flow of categories.
(And I could go on and on -- BPM, KM, CMS, WCM, ECM, LMS, DRM, SCM, PLM, ETL, DI, EII -- but I think I'll stop here with the initials list.)

People are still creating categories today, and sometimes it looks easy. Uber-categories have been quite popular in the past decade as people have focused on different ways of developing and delivering software:
  • SaaS as an uber-category has worked well, with a variety offerings in various SaaS sub-categories (e.g., Salesforce, NetSuite)
  • Appliances have done pretty much the same thing -- i.e., offering an appliance alternative for a wide variety of existing categories (e.g., a data warehouse appliance a la Netezza)
  • Open source has also done the same thing -- again serving as a different flavor/dimension for a wide variety of largely existing software categories.
Only a few genuinely new categories have emerged, virtualization being the most obvious example. (Though you could argue that virtualization is itself an uber-category covering storage virtualization, server virtualization, et cetera.)

Companies are still working to carve new categories, particularly in the database market:
Sometimes vendors and/or the analysts who cover them try to impose either a straight name change (e.g., from MD-DBMS to OLAP) or a strategic shift (e.g., from BI to analytic applications) in category. Sometimes they're just bored. Sometimes a vendor's trying to redefine the market in line with its strengths. Sometimes an analyst is trying to make his/her mark on the industry and earn the coveted "father/mother of [category name]," much as Howard Dresner successfully did with BI.

BI got bored with its name several times during my tenure at Business Objects. At one point both the analysts and Informatica were trying to re-dub the category "analytic applications" in an attempt to get a fresh name and raise the abstraction level from tools to applications. Informatica nearly died on that hill.

Later, analysts tried to redefine the category, dubbing it corporate performance management (CPM), and arguing that business intelligence needed to link with financial planning systems. While knowing actuals is good, knowing actuals compared to the plan is better, and using actuals to drive the future plan better still. Cognos nearly tripped over itself repositioning around the CPM, ultimately acquiring Adaytum, which in turn lead to SRC's eventual acquisition by Business Objects.

In an art-imitates-life sort of way, one wonders if the analysts predicted a move in the market or provoked it? My chips are on the latter.

This stream-of-consciousness is a long way of winding up to a single question: are enterprise search vendors successfully repositioning themselves as "information access platforms" or not?

Background: the enterprise-search-related vendors (e.g., Fast/Microsoft, Endeca) and search/content analysts who cover them are in the midst of an attempted category repositioning:
  • The word "enterprise search" is now seemingly dead, having been contaminated by the Google Appliance. When a shark gets in the water, all the fish jump out.
  • The word "information" is increasingly being used as a unifying term to describe both data and content (aka, unstructured data)
  • Enterprise search vendors are increasingly calling themselves "information access platforms" (though not generally abbreviated as IAP, I will do so here for brevity).
For example, consider Endeca's corporate boilerplate:
Endeca's innovative information access software that helps people explore, analyze, and understand complex information, guiding them to unexpected insights and better decisions. The Endeca Information Access Platform, built around a new class of access-optimized database, powers applications that combine the ease of searching and browsing with the analytical power of business intelligence.
I have a number of concerns on and related to this attempted shift:
  • The important thing about categories is that they exist in the mind of the customer. Analysts and vendors can try to put them there -- but they have to stick. In my mind, IAP is not sticking. I have never heard a customer say: "I need to go out and get an IAP."
  • I do, however, believe that "information" might well stick as an overall term, meaning both data and content (aka, structured and unstructured data).
  • It is not clear to me why someone who desires a unified platform for "information" would turn to a search vendor. Search engines were designed as read-only indexes to help people find documents containing tokens; hardly ideal as an application development platform.
  • In my estimation, someone managing "special" data should turn to a database vendor. While databases have classically not handled "special" data well, databases were designed as application platforms, and there is a whole new class of specialized databases emerging for handling various "special" types of data.
  • While I think a unified platform is a dandy vision, I think no one is close to delivering a unified platform that handles all types of data equally well. Bolting Lucene and MySQL together isn't a platform. Relational databases still do a poor job with both content and many types of data (e.g., sparse, hierarchical, or semi-structured). XML servers (like MarkLogic) handle XML brilliantly, but need work before they can match RDBMSs at classical relational data.
  • I believe that someone who needs a crawl-and-index the intranet value proposition should use the Google Appliance; so I think the search vendors are correct in their desire to flee, I don't think that "information access platform" is a good refuge.
Overall, my chips remain on the don't come line for the attempted category repositioning from "enterprise search" to "information access platform." You can find my stack on the come line for the emerging "special-purpose database" category and "XML servers" as an instance of them.