Monday, November 05, 2007

Mark Logic Redefines E-Mail Search: Introducing MarkMail

This weekend Mark Logic launched a new Internet service, called MarkMail (tm), that lets users search 4,000,000+ emails from over 500 Apache mailing lists, in order to analzye trends, locate experts, and get fast, precise answers to technical questions.

Put one way, MarkMail redefines what search means in the context of e-mail (think: what Technorati did for blogs.) Put another, MarkMail is a demonstration of the power of MarkLogic Server when aimed at e-mail content. (Think -- and I know the analogy is risky -- AltaVista, the Internet search engine launched to demonstrate the power of DEC's Alpha chip.)

MarkLogic Server is an XML content server, a special-purpose database management system (DBMS) designed and optimized for managing XML documents. In plain English, MarkLogic Server is the world's best place to put XML documents.

But who has XML documents that they need to put somewhere? Today, that's largely constrained to the information industry (i.e., publishers) and the Federal government, particularly the three-letter agencies. (You can also find XML content in certain enterprise functions like technical publications.)

One reason we're so excited at Mark Logic is we know the world is moving our direction. While relatively few people use XML as their document markup format today, virtually everyone is moving to XML, whether they know it or not, as Microsoft takes the standard Microsoft Office document format to XML in Office 2007. But adopting Office 2007 will take a while. So what content can we leverage meantime to show-off the power of our server?

E-mail. Why?
  • It's semi-structured, and we love working against semi- and un-structured information. E-mail has some clear metadata (e.g., author, subject, send-date) and plenty of free text, both in the body copy and in the metadata fields (e.g., thread topic) themselves.

  • It's easily converted to XML.

  • It's ubiquitous. Everybody uses it.

  • There are lots of free, public mailing lists that contain lots of valuable information -- on topics from wine to Tomcat and everything in between.

  • Most important, e-mail is -- as Mike Moritz of Sequoia Capital once said -- the new corporate knowledgebase.

To expand the last point. If I told you that you could go to one place -- and only place -- to learn about a company, where would you go? To their corporate data warehouse? To their knowledgebase? To their financial systems? To their sales and CRM systems?

Personally, I'd go to their e-mail. Despite years of attempts to systemize it, knowledge has eluded capture and evaded knowledge management systems. Knowledge, it seems, instead resides in e-mail and collaboration systems. Through e-mail I can find lots of important quantitative information (mailed around as spreadsheet attachments) but more importantly, the color and commentary that goes along with it. As Mark Logic's Jason Hunter once put it: "I can see the movie (the data), and the subtitles that go along with it."

E-mail is the one-stop shop for information inside most organizations. So why not demonstrate our power on e-mail, we thought? So we did.

The other nice thing about e-mail is that it has additional idiosyncrasies that let us show-off more of our power.

  • Included text and conversation threads. MarkMail does a great job of eliminating duplicate inclusions and re-building a conversation from a series of emails.

  • Attachments. We love documents and people email them all the time. MarkMail has some very nice -- and sexy -- ways of handling e-email attachments.
Go try MarkMail, now! If you're not an "open source person" and don't know what to search for, you can start with one of my favorite searches: XML indexing in the Lucene project. (Hint: if you're looking to index XML with Lucene, it's a good indicator that you should be perhaps looking at MarkLogic, which indexes XML natively.)

Once you've tried MarkMail, please do two things:

  • Tell a friend. Particularly any open source types (the target users of the current incarnation of the service) you know.

0 comments: