Thursday, January 31, 2008

Google and Autonomy Spat, Round II

Autonomy and Google are at it again. Per this InformationWeek story:
For the second time in six months, Google has publicly challenged a white paper from enterprise search rival Autonomy, claiming the latest document contains "significant inaccuracies."

For customers with demanding needs, the Google appliance lacks the necessary security and connectivity models," Mike Lynch, chief executive of Autonomy, said in an emailed statement. "It is not possible to make successful high-end enterprise search solutions without mapped security and productized connectors to repositories."
I've not yet had time to dig into the detail of this, so I'm sharing it more as a news item for now and will -- if it proves interesting -- come back with analysis later.

Google's rebuttal is here on the Google Enterprise blog.

My free PR advice for Google is to avoid a spat and simply create a low-key white paper that responds to any claims they believe are incorrect. In my experience, in PR wars the big guy never wins. Sometimes the little guy wins. Sometimes both companies lose. So when you're the leader the best strategy is not to fight. Much as you want to.

The Publishing [R]evolution

I'm posting the slides that Darin McBeath from Elsevier presented at the XML Holland conference a few months back. I'm sorry about the delay, but I wanted to be sure it was OK with Darin and the process got stuck on my back burner.

In addition to an all-around great speech, Darin introduced two concepts that I liked a lot.
  • Fewer moving parts
  • Find the ringtones
"Fewer moving parts" was Darin's metaphor on simplicity in building pure XML-based systems (with XML content and XQuery as the query / programming language). It's always hard to argue the business benefits of simplicity without doing detailed costing analysis. I thought it was creative of Darin to use this metaphor to drive the point home. We know jet engines are safer than piston engines because they are simpler and have fewer parts. The same could be said of Nokia vs. Motorola phones. Fewer parts works. When you build content applications on XML content with XQuery and an XML content server, you have fewer moving parts. No Java layer. No relational mappings. See this post, The Virtues of Top-to-Bottom XML, for more.

"Find the ringtones" was another cool Darin idea. As you probably know, ringtones are a multi-billion dollar business. The amazing thing about ringtones is that you can charge $3.00 for fifteen seconds of a song which in its three-minute entirety would sell for $0.99. Less really can be more. Darin's challenge to publishers was to "find the ringtones" in their content. Where, in different sections of the publishing business, can you deliver higher value and increased revenue -- by offering less? That's a cool question. And in an increasingly information-overloaded world, a smart one.

In the better late than never department, here are Darin's slides.

Should Publishers Buy Platforms or Solutions?

This seems to be one of the age-old questions in publishing and, on returning from the SIIA Summit this week in New York City, I thought I'd weigh in with a quick answer.

My answer is rooted in Geoffrey Moore's idea of core/context analysis introduced in his book, Living on the Fault Line, though also referenced in subsequent works (e.g., Dealing with Darwin). Moore defines core and context as follows:
  • Core: Any activity which creates sustainable differentiation in the target market resulting in premium prices or increased volume. Core management seeks to dramatically outperform all competitors within the domain of core. (Note this use of the term is unrelated to either core competence, which describes differentiated capability, or core business, which describes categories accounting for a high percentage of overall revenues.)
  • Context: Any activity which does not differentiate the company from the customers' viewpoint in the target market. Context management seeks to meet (but not exceed) appropriate accepted standards in as productive a manner as possible.
Simply put, I define context as "everything else." There's no argument that context (e.g., payroll, sales pipeline management, accounting) isn't a lot of work. And there's no argument that context isn't important (e.g., get accounting wrong and go to jail). In a logic sense, I'd argue that successful execution of context activities is necessary but not sufficient for business success. What's sufficient is successful execution of core.

The thing I like about Moore's model is it clarity and simplicity. Core is where we get competitive advantage, context is everything else. For context activities, we should use the same solutions as our competitors because we are not trying to get competitive advantage from context. So we can all use ADP for payroll and salesforce.com for SFA. In fact, I'd argue that we benefit from a group economy of scale when we generally use the same suppliers for the context activities (mitigated only by those suppliers becoming monopolists and then overpriced and under-responsive).

But for core, we should be different. Core is where we provide uniqueness. Ergo, the right answer to me is clear.
  • For context activities, publishers should buy solutions
  • For core activities, they should buy platforms
There is no question that "product development" is a core activity at publishers. Ergo, publishers should buy platforms (to accelerate product development and avoid re-invention of infrastructure technology) and not buy"solutions" that effectively give them the same offering as the competition and -- worse yet -- allow the solution supplier to productize any customization or innovation you do back into the solution offering and sell it to your competition.

I'd again argue that there is a second-order argument where all publishers are better off when many of them use the same platform (e.g., hiring, training, vendor responsiveness to industry feature requests), but we won't get into that here.

Hoping to derive competitive advantage from a prepackaged publishing solution is, in my opinion, oxymoronical. There are two things that money can't buy: love and competitive advantage.

Monday, January 28, 2008

NotchUp: Get Paid to Interview

I was doing a quick surf through the list of companies slated to launch at DEMO-08 this week and stumbled into NotchUp which caught my eye both because a few friends had signed up (and I was getting some Spock-like emails) and because of NotchUp's cool, creative-yet-practical idea: they pay you to go on job interviews.

Better yet, if you're still mulling the concept, NotchUp has a quick calculator that estimates what you should charge. In my case, it spit out $1,830. (And to think prior to this discovery I could be had for grande non-fat latte.)

As a high-growth company that depends on both contigency and retained recruiters, I believe there is a lot of room for cost efficiency improvement in the recruitng process in corporate America. So, similar to LinkedIn, I like the basic business problem they're shooting at.

Give it a try!

(Warning: NotchUp's site is appallingly slow, presumably due to the activity spike generated by the launch PR.)

Friday, January 25, 2008

John Blossom on Kick It

John Blossom, president of Shore Communications which runs the award-winning ContentBlogger blog, recently had a look at our new Facebook application, Kick It, and wrote an interesting post about it.

Here's an excerpt:
Kick It is a simple and yet powerful example of how analysis of social media content can yield a treasure trove of potentially useful associations that can fuel both personal and professional contacts in unexpected ways. Take these associations and layer on additional content from other content sources and you can begin to get a sense of why embedding your own publication's content in social media portals such as Facebook can be so valuable.

Check out John's post here. Check out Kick It here.

Thursday, January 24, 2008

Mucho Dinero: Freebase Raises $42M.

Metaweb, makers of the unfortunately named Freebase, recently announced that they have raised a whopping $42.5M venture round led by Goldman Sachs and Benchmark Capital.

Freebase is trying to create a database of knowledge with a collaborative Wiki-like approach. The differences are (1) there's much less in it today, (2) it leverages Wikipedia as a source, (3) it's far easier to edit entries in Freebase than in Wikipedia, and (4) it doesn't follow a strict encyclopedia-like entry per topic.

As a simple example of point 3, despite having wanted to make Wikipedia edits I've never bellied up to the bar to do so. In 10 minutes of messing around on Freebase I updated the entries for both myself and Mark Logic in Freebase. It's very easy. And they do a good job of both knowing what metadata is appropriate for which types of entities and for providing pick-lists for entering into metadata fields (e.g., dates of employment, job titles, companies). And they make it easy to add a new entry as well. For example, I had to add Versant Corporation to their database of companies before listing it as one of my past employers. (My only enhancement request is to enable "present" as a valid value for date types when listing dates of employment.)

When using Freebase I find myself doing primarily two things: (1) extracting data from text and loading it into metadata fields (e.g., the attributes about the San Carlos airport runway were added by me) and (2) creating web links / relationships (e.g., I added the link to the San Carlos airport website.) Basically, I'm structuring knowledge.

Freebase was originally launched to strong reviews in March, 2007 (with Michael Arrington saying: "this is cool unless it achieves consciousness and kills us all") though Metaweb itself was founded in July, 2005. (And I just updated the founded-date field in Freebase for Metaweb which previously just said 2005.)

I haven't spent that much time on GoogleBase, but I'm not all impressed with what I see when I go there. My take is that Freebase is what GoogleBase wanted to be.

Wednesday, January 23, 2008

Endeca Raises $15M: Good News or Bad?

In a somewhat surprising move, Endeca recently raised a $15M strategic investment from Intel Capital and SAP Ventures. See this story in the Boston Business Journal, entitled Endeca Receives Investment from Intel, SAP for more.

In the "always a bridesmaid, never a bride" department, Endeca has been widely rumored to be going public "soon" for quite some time. For example, see this BusinessWeek story, dated 3/31/06, where Endeca CEO Steve Papa says he thinks "Endeca could be ready to go public by 2007." Hence I wasn't expecting another engagement announcement, but rather a wedding invitation (i.e., an S-1).

Put differently, is this venture round "good news" that the company has raised additional capital or "bad news" that it has not yet been able to do so via an IPO? (And, for inquiring minds, why not?)

Per this TechCrunch story, this brings the total funds invested in Endeca up to $65M (a lot for an enterprise software company in my opinion) and the company, which was founded in 1999, has sales of ~$30M/quarter, and ~500 employees.

Fast Search to Restate 2006 Results (As Predicted)

As predicted in this post, Fast Search & Transfer yesterday announced that it intends to restate 2006 results. See this Reuters story, entitled Norway's Fast Says to Restate 2006 Results, for more. Excerpt:
"The effects of such restatements have not yet been established in detail, and Fast is taking appropriate steps to ensure a quick and proper process," Fast Search & Transfer said in a statement.

"Restatements of the 2006 accounts may have an effect on the 2007 accounts," the company said.

It's hard to imagine that this will have any effect on the Microsoft offer, as a restatement was easily anticipated given all the account receivables write-offs, and one must assume that Microsoft learned plenty about Fast's financials during the due diligence prior to their bid.

Forbes.com has a similar story on the restatement, here.

Tuesday, January 22, 2008

New Look for the Mark Logic CEO Blog

Today, I've moved to a new template for this blog.
  • New colors and design.
  • Placement of ads only in the right column, where they belong. This was what motivated the design change. Despite having decided that I can earn more from panhandling than Google AdSense, I want to continue advertising on the blog as a way of staying in touch with how AdSense works. Thank you for bearing with me during the phase where I could only figure out how to put ads above the posts.
  • Normalization to a single subscribe chicklet in the right hand column.
  • Addition of a "links to this post" button in the post footer.

Getting MarkMail Indexed

Tim O'Reilly recently posted on an issue that our MarkMail team is having, related to getting search engines to index the entire MarkMail contentbase. Tim's post is entitled Stuffing Six Million Pages Down Google's Throat and was written in response to an email dialog with Jason Hunter.

To me, the high-level takeaways are:
  • Wow, is Google's spidering better than anyone else's. Google has currently indexed ~1M MarkMail messages, while Yahoo has got only 19K and MSN 4K. That is, Google seems to find stuff much faster and index it much better than other search engines, explaining their hegemony in Internet search.
  • While we are using the standard SEO techniques (e.g., sitemaps), Google does seem to decide how important you are and ergo how much of your site to index. For example, the MarkMail sitemap includes a link to each of MarkMail's approximately 6M pages. Google can find all those links. It's just choosing not to index them. My guess is it's deciding how important you are on some basis and then capping the amount of your site it will index.
Here are some excerpts from Tim O'Reilly's post, which to me largely resonates around the issue of Web 2.0 math and invisible constraints:

There are only so many seconds in a day, and the larger the number of pages on a site, the harder a crawler would have to hit the site to index them all while keeping reasonably up to date copies. Small sites with lots of pages thus provide an impedance mismatch for crawling. Obviously, with Google showing 58.4 million pages in response to "site:myspace.com", and 73.3 million in response to "site:flickr.com", a high performance site justifies a high performance crawl, so the question is how Google makes the decision how deep to go. (Interestingly, "site:facebook.com" shows only 906K pages, suggesting that a huge proportion of Facebook's pages are still private.)

[...]

Still, I'm reminded of a comment by Ben Bernanke reported in today's New York Times profile, The Education of Ben Bernanke, that he is "a believer in the laws of mathematics." I've become increasingly fascinated by the underlying math of Web 2.0 since reading Jeremy Liew's post about the economics of online advertising last year. Limits are, of course, made to be broken. But it's worth thinking about absolute (and temporary) limits to the growth of Web 2.0. What constraints do we take for granted? What constraints are invisible to us?
Here is the The Making of MarkMail post on this topic for those interested in the next level of detail.

Save the Date: 2008 Mark Logic User Conference


Just a quick post to ask everyone to mark their calendars and save the date for the 2008 Mark Logic User Conference on June 10 - 13, 2008 in San Francisco.

This is our best event of the year. Last year, we had over 200 people in attendance, lots of great customer presentations, a Tim O'Reilly keynote, a special track for US Government customers, and enough technical content to satiate even the gnarliest of developers.

As the company grows, we're upgrading the venue. This year we'll hold the event at the newly renovated, beautiful Intercontinental Hotel in San Francisco.

If you're interested in speaking, let us know. Drop us an email at pr-at-marklogic.com (I didn't use a real "at" sign to avoid email address spiders) or call our director of product marketing, John Kreisa, at 650 655 2300. See you there!

Friday, January 18, 2008

Mark Logic (Quietly) Launches Facebook App

Mark Logic has recently launched, quietly, and on what I'd call an organic basis, a Facebook application called Kick It.

What?! MarkLogic. Facebook App. Kick It?

Heck, you didn't even think "kick it" was in our corporate vocabulary, did you? (OK, until the day of the naming meeting where we interviewed our marketing VP's 16-year-old son, it wasn't in mine either. But have no fear. "Kick it" is a synonym for "chill" or "hang out with." Usage: Dad, I'm going downtown to kick it with Bob.)

So what is it? Kick It is Facebook application that lets you find out what your friends have in common, so you can either find people with whom to "kick it" (e.g., go downtown, go to a concert) or simply experience the sheer joy of analyzing your friends and creating groups of them based on queries.

Sound dumb? I'd say the first application is quite practical. I'd say the second is very Facebook. Most popular Facebook apps let you do what software companies would consider fairly basic things with your friends and, somewhat amazingly, no Facebook app today actually solves this particular problem very well.

Until Kick It, that is.

Why did we do it? Simple. One of our new engineers, David Amusin, was looking for a new-hire orientation project to learn XQuery. As it turns out, the Facebook API outputs high-quality, interesting XML and MarkLogic, as you know, lets you quickly and easily do amazing things with XML content. So David was thinking about building a Facebook app. Then one day, he wanted to find a friend with whom to see a Dave Matthews concert and had trouble doing so with existing Facebook tools. So -- bang -- he built Kick It as his new-hire project.

I saw it, thought it was cool, and said run with it and let's see where it goes. If nothing else, it's a good demo of both the Facebook API and what MarkLogic can do with Facebook XML.

Check it out. I invited my 14-year old try it yesterday. While I still find the UI a little rough and non-self-explanatory, he loved it. This Facebook blogger found it as well, and he gave it a thumps-up, too.

Give it a try!

Thursday, January 17, 2008

Sun Buys MySQL for $1B

I feel like this is becoming an M&A blog, but there been a lot of relevant M&A activity of late that I felt I needed to cover (e.g., Microsoft/FAST, EMC/Document Sciences).

Yesterday's big news (aside from Oracle / BEA for I think $8.5B) was Sun announcing that it would buy MySQL for $1B.

Frankly, this deal caught me by surprise. I've been critical of Sun at times, but I think this is a smart move for them. It continues their trend of trying to offer open source and/or cheap software tools (e.g., Star Office) that undermine incumbents in large markets. And it will help them transition to from a wounded workstation and server company to something else. What "something else" is I'm not sure. I am sure, however, that they can't stay still, so in a sense any motion represents potential progress.

Schwartz is an active CEO blogger (gotta love that), so he has written his own extensive post on the deal, here. Excerpts:

But the biggest news of the day is... we're putting a billion dollars behind the M in LAMP. If you're an industry insider, you'll know what that means - we're acquiring MySQL AB, the company behind MySQL, the world's most popular open source database ...

But as I pointed out, we heard some paradoxical things, too. CTO's at startups and web companies disallow the usage of products that aren't free and open source. They need and want access to source code to enable optimization and rapid problem resolution (although they're happy to pay for support if they see value). Alternatively, more traditional CIO's disallow the usage of products that aren't backed by commercial support relationships ...

So why is this important for the internet? Until now, no platform vendor has assembled all the core elements of a completely open source operating system for the internet. No company has been able to deliver a comprehensive alternative to the leading proprietary OS. With this acquisition, we will have done just that - positioned Sun at the center of the web, as the definitive provider of high performance platforms for the web economy. ...

Information Week covers the deal here.

Tuesday, January 15, 2008

Don't Miss Upcoming O'Reilly Tools of Change Conference

O'Reilly will host the second Tools of Change (TOC) for Publishing conference in New York City on February 11 - 13, 2008. I'll be attending the show and based on the feedback from the first TOC conference, I'm quite looking forward to it.

Java guru, author, JDOM creator, and MarkMail father, Jason Hunter, will be speaking at the show, with a talk entitled Next-Generation Web Publishing at 1:40 PM on 2/11/08. Here's an excerpt from his abstract:
If we’re moving toward Web 2.0, what does that mean for web publishing?

In this talk, Jason Hunter answers that question. Based on his experience as principal technologist at Mark Logic, working with dozens of the largest online publishers, he’ll present a vision for how the Web 2.0 concepts like personalization, collective intelligence, the long tail, and the importance of “owning the data” can and should reshape the face of online publishing—and how XML, XQuery, and XML-aware text search act as the key enablers. He’ll also introduce new Web Publishing 2.0 concepts like “Sweat the content” and “Give answers not links.”

Jason's a super speaker, so I highly recommend attending his talk.

Let me share some other things I view as interesting highlights from the program.

If you're still on the fence about attending, I suggest you read this document, Reviews and Highlights from the 2007 Tools of Change Conference. It's both great marketing and very informative (which makes it great marketing).

See you there!

Tuesday, January 08, 2008

MarkMail: Going Swimmingly

Mark Logic today issued a press release highlighting our progress with our new MarkMail service, entitled Open Source Communities Turn To Markmail For Email Archiving And Search.

Slightly edited excerpts:
Launched in November 2007, MarkMail is a community-focused searchable message archive service, accessible at http://markmail.org. MarkMail allows people to leverage the immense amount of collective knowledge accumulated over time through email discussions. Users can find technical information, research historical decision making, analyze and understand trends, and locate experts for a wide range of technical topics.

Since launch MarkMail has added more than 1.5 million new messages, sourced from 200 public email mailing lists including PHP, MySQL, Mozilla, and XML communities. MarkMail now contains more than 700 mailing lists and nearly 6million messages, and in the last two months has added:

  • 550,000 messages about PHP
  • 360,000 messages about MySQL
  • 275,000 messages about Mozilla
  • 200,000 messages about XML
  • 85,000 messages about CSS
I must say I love the one-word quote from Tim O'Reilly: "Yumm."

EMC Acquires Document Sciences for $85M

On 12/27/07, EMC announced the acquisition of Document Sciences (DOCX) for $85M in cash. DOCX's stock has been flat in the $6 range for most of the past two years, recently moved to the $8 range, and is up to about $14 on the announcement of the deal.

The New York Times reports on the story here. Excerpt:
Storage company EMC has agreed to buy Document Sciences, a developer of software for personalizing mailshots and other communications. The acquisition will allow EMC to extend its offering in the field of transactional content management, which it sees as the fastest-growing part of the enterprise content management market. It plans to incorporate Document Sciences into its content management and archiving division.
Alan Pelz-Sharpe of CMSWatch has a nice write-up here. Excerpt:
The storage centric, archiving / transactional document management story now being built by EMC positions them to play more strongly in the ever-changing ECM market. For a long time Documentum was a leader in ECM, but over the last 2 years they have lost their shine and momentum. Of course it will take time for the acquisitions to be absorbed, for the new "D6" version to be truly tested and worked out by the market, and for EMC to build a cohesive and comprehensive technical architecture across its product line. And so for buyers, EMC remains a turbulent vendor to deal with. But the moves they are making seem solid, and the prognosis looks good. It's a developing story that I will continue to cover in detail in the ECM Suites Report throughout 2008.

Microsoft Bids $1.2B for Fast Search and Transfer

It's a busy week so no time for a deep analytical post (yet), but I wanted to get this news out fast. Microsoft has bid $1.2B for Fast Search & Transfer. See this New York Times story for more.

My initial take:
  • It's a quite healthy valuation of ~8x the revenue run-rate, partially justified by an above-average growth rate. (Disclaimer: numbers approximate and from memory.)
  • Some of it, I bet, is psychological, because $1.2B gets back to the "recent" peak valuation (during the past year), prior to the accounting scandals which rocked the company and whacked the stock. In my experience, company sellers tend to hang on emotionally to "recent" highs in deciding their price. Sometimes they get the old valuation back. Sometimes they don't. In Fast's case, the 52-week range was ~8.5 to 18.5 kroner. The deal is, I believe, at 19 kroner. (Note that the chart seems to miss the last day's trading, which took the stock up to about 18.5 kroner.)
  • It seems a logical ending for Fast. As I pointed out a few times in this blog, Fast was letting the same guys who got the company in trouble continue to run the company (with one or two changes). I thought this was a mistake. I thought it didn't hold the executives accountable. I thought they wouldn't be able to fix the problems. So selling to Microsoft seems a practical solution to these problems.
  • This post on the Microsoft Enterprise Search Blog quotes Kirk Koenigsbauer, General Manager of the SharePoint Business Group, which suggests that the SharePoint team drove the acquisition.
  • A friend at Microsoft had this to say: "[the] deal was all about enterprise search competitiveness (at the high end) vs. Google and to an extent IBM. Both search engine capabilities and connectivity to line of business [systems], content management, and other data sources."
More coverage:

Monday, January 07, 2008

Tim O'Reilly Blogs on MarkMail: "Amazing"

Happy New Year to all my readers!

I'm back from Mexico and ready to start posting again. I'll kick off 2008 with a post to highlight a blog entry that Tim O'Reilly did today on MarkMail, our Internet service for searching email archives.

Tim's post is available here, and entitled MarkMail Provides Amazing Search Capabilities. Here's a highlight:
While there may be a new generation that thinks that email is for old fogies, for many of us, email is a primary online tool, at least as important to us as the web. Many of us no longer file documents or attachments -- we just search for them again in our email. Perhaps most importantly, email is a primary collaboration tool--and as many of us have figured out, collaboration is one of the internet's killer apps. Searching our shared memory in a collaborative space is REALLY useful -- with open source mailing lists being a great example.

Despite its importance, very little has been done to improve on email. The clients we use today are not radically different from what we used ten years ago (except perhaps in being web-based). This is why there was so much excitement when xobni showed how useful it is to expose the social network hidden in email. MarkMail does something equally powerful.