Thursday, September 27, 2007

Mark Logic Partners with RWD in Pharma

I'm pleased to announce that Mark Logic has announced a partnership with RWD Technologies whereby RWD will leverage MarkLogic in its InfoMaestro applications designed specifically for life sciences, public sector, and manufacturing industries.

We've known RWD for over a year and are already been working with them at several clients, so this is a not a Barney partnership. This is about leveraging MarkLogic with InfoMaestro to help customers in life sciences / pharma, public sector, and manufacturing get fast access to unstructured information and to improve search and discovery within their content stores.

For Mark Logic, this is about continuing to build our partner program and continuing to expand into new vertical markets by working with leaders who have deep expertise in chosen markets. Just as we have strong relationships with publishing leaders (e.g., Really Strategies, Flatirons, iFactory) and strong relationships with major government integrators (e.g., Boeing, L-3/Titan, and Booz Allen), so we are now building strong relationships with those in other verticals, such as Jouve in Aviation and RWD in life sciences.

We're pleased to be working with RWD and look forward to a long and fruitful relationship. The press release announcing the partnership is here.

Wednesday, September 26, 2007

Vote for MarkLogic in the Information Today People's Choice Awards

If you're a fan of MarkLogic, please do us a favor and make your voice heard by voting for MarkLogic in the category of "top enterprise application" in the 2007 Information Today People's Choice Awards.

The link to the voting page is here. We're about 3/4ths of the way down the page. You don't need to vote in every category if you don't want to.

Vote early and often (just kidding -- they take your name)!

Tuesday, September 25, 2007

Andy Feit Joins Mark Logic

I'm happy to announce that Andy Feit has joined Mark Logic as vice president of marketing. Andy has a rich background in enterprise software and search and we're pleased as punch to have him on board.

Andy's previous experience includes running marketing at Verity (enterprise search, now part of Autonomy), Knova (CRM / intelligent customer experience applications) and Adomo (IP telephony) as well as general management and marketing positions at InfoSeek and and Inktomi. He's even done a brief stint as an industry analyst at Gartner/Dataquest.

Bringing Andy on board continues Mark Logic's pattern of uniting DBMS and search people and technology to build our unique product and company. (See this post for more, entitled Half Man, Half Machine, All Cop.)

The press release on Andy's appointment is here.

Mark Logic / Flatirons Webinar Archive: Dynamic Content Delivery Using DITA

Just a quick post to provide a link to the archive of the popular webinar we did recently with Flatirons Solutions entitled Have It Your Way: Dynamic Content Delivery Using DITA.

DITA refers to the Darwin Information Typing Architecture, an XML-based architecture for authoring, producing, and delivering technical information. This webinar drew a big response (over 400 folks) from a wide variety of industries. DITA clearly is getting some strong momentum in the technical publications community and with Mark Logic you can use DITA not only to automate the production of the "usual suspect" deliverables (PDF documentation, helpfiles) but more important to deliver dynamic personalized content to individuals.

Ajay Singh from Mark Logic and Eric Severson from Flatirons are the speakers. It includes a demo of O'Reilly Media's MarkLogic-based SafariU application at minute 26 and a demo of a loan processing application at The World Bank at minute 35.

Thursday, September 20, 2007

Slides from my ASIDIC Panel Presentation

Last Monday, I had the pleasure of speaking at the ASIDIC Fall 2007 Meeting in Washington, DC at the Westin Arlington Gateway hotel.

I spoke on a panel entitled Competitive Strategies for the Future, hosted by Matt Dunie, President of ProQuest who provided an introduction to the panel. I was on the panel with:
I enjoyed the conference and speaking on the panel. Slides for all presentations may be found here (in-lined with the agenda).

My slides follow, courtesy of slideshare.net:


Tuesday, September 18, 2007

The Wrangler Interviews Severson on DITA

The Content Wrangler recently posted a great interview with Eric Severson, CTO of (Mark Logic partner) Flatirons Solutions.

Here's my favorite excerpt:

TCW: In other words, most folks using DITA today are only taking baby steps toward improvement. They don’t seem to get the “bigger picture”. Can you help our readers understand the value proposition of DITA? Isn’t the real promise of XML, and by extension DITA, to help us deliver targeted, personalized content on demand?

Eric: Supporting targeted, personalized content has always been a key goal of XML, and depends on four capabilities that XML has always had:

* support for multiple delivery channels from a single source;
* reusability of content building blocks across publication types;
* filtering content for a specific purpose or audience;
* and supporting powerful, context-dependent search.

What DITA does is to significantly sharpen these capabilities by structuring information building blocks as standalone, reusable topics. When content is structured this way, it becomes possible to freely mix-and-match it into personalized publications—without worrying that the result will be disconnected and choppy, or as some have said, read like a ransom note.

But DITA alone is also not enough. To deliver targeted content on demand, it’s essential that content be dynamically selected, assembled and personalized in real-time to meet a specific individuals’ needs. Trying to create all possible combinations in advance just won’t cut it. Thus the other two essential ingredients are a powerful XML-based query language like XQuery, and a high-performance XQuery-based delivery engine like Mark Logic.

The Powerset Launch: Beware "The Next" Positioning

I've never liked what I call "the next" positioning. Examples:
  • The object DBMS vendors in the early 1990s positioning as the next Oracle. (Gartner analyst and database guru Donald Feinberg once told me that if he had quarter for every startup that announced that they were the next Oracle that he'd have a lot of quarters.)
  • Numerous singers trying to be the next Elvis. There won't be another Elvis. The original one still lives today in Vegas and works as a pit boss. (There is, however, an alternative Elvis, Johnny Hallyday, very much the French Elvis who lived.)
  • Lots of SaaS vendors trying to position as the next Salesforce. No one has really succeeded, somewhat amazingly. SaaS today strikes me as a one-success category. Go read the Netsuite S-1 if you're feeling differently.
In fact, in a Murphy's Law sort of way, positioning as "the next something" seems an almost certain guarantee that you won't be. Thus, it should come as no surprise that no one consulted me before deciding the positioning of Powerset, a much-hyped natural language search engine, in their recent launch.

If you don't believe that they're positioning on this angle, then see these stories:
In fact, if you run the Google search "Powerset Google" you come up with 1.4M results.

Don't get me wrong. There are many things about Powerset that I like. I like the argument that discarding stopwords can cause a major loss of meaning. I like their characterization of search"keywordese." I loved the grunting pigeons. I like their blog and loved their attempt to parse Miss Teen South Carolina.

But I have two problems with Powerset. First, they shouldn't have done a "next Google" positioning which is bound only to disappoint. Second, they have failed to learn a key lesson from the business intelligence (BI) market: it's not about natural language search -- it is about database query.

All search vendors seem obsessed with a quest to "figure out what you mean" based either a few grunted keywords or a short phrase. In the early days of business intelligence people went on that quest, too. I recall the DataTalker from Berkeley-based Natural Language, Inc. The idea was you could ask seemingly innocent database queries like "who sold the most new products in New York on Tuesdays?"

The problem was:
  • What do you mean by sold? Market value or net to us? Before or after allowance for doubtful accounts?
  • What do you mean by Tuesday? Do you literally mean Tuesday or do you mean the second day of the work week?
  • What do you mean by New York? Do you mean the city or the state? If you mean the city, do you mean Manhattan or the five Boroughs?
  • What do you mean by new products? Launched within the last 6 weeks or 6 months?
There are reasons why Business Objects went on to become a $1.5B company while NLI was sold to Microsoft for a pittance:
  • Natural language is notoriously imprecise
  • Devising a simple-to-use interface for specifying precisely what you want seems infinitely superior to all sorts of advanced technology that guesses
  • Similarly, creating a semantic layer that defines precise answers to all the "what do you mean" questions seems infinitely superior to more guessing
As technologists, we are drawn to interesting questions and whizzy technology that imputes meaning from language. (Heck, I like it, too.) And if you like such technology you can use it in conjunction with Mark Logic (just use it to enrich XML content and add tags).

But the lesson to me is that database-style query beats either grunted keyword or short-phrase natural language search when building enterprise systems.

I won't predict whether natural language search will beat keyword search on the broad Internet. But I do believe that for enterprise content systems in publishing, government / defense / intelligence, pharma / life sciences, and financial -- that what's needed is database-style queries on content, not the ContentTalker.

Monday, September 17, 2007

Fooled By Randomness

I read a great book a few flights ago, entitled Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb. The book was written by a mathematics and statistics adept trader / academic / philosopher who explores fun and interesting questions about probability, causation, psychology, and life.

Simply put: if someone is successful in the financial markets, is it because they are skilled or lucky? Do they have a better strategy than someone else, or is their strategy simply, accidentally, more suited to given time period?

Taleb provides (anecdotal) example after example of traders who kept doubling down on given strategies only to "blow up" spectacularly in the end -- often wiping out (thanks to leverage) in a single day the sum total of their profits over a period of years.

My favorite example from the book is a simple one, which he calls The Mysterious Letter (page 157). Suppose you open an anonymous letter on January 2 that says the market will go up during January, and it proves to be true. Then you receive another letter on February 1 saying the market will decline, which also proves to be true. This continues each month. By July, you are amazed by the prescience of the sender, who then proposes an investment offer in an offshore fund. You quickly wire your life's savings to the advisor only to find that weeks later your money is all gone and that you've been conned. What happened?

In January, the con man pulls 10,000 names from a phone book. He mails a bullish letter to half the list and a bearish letter to the other half. The next month, he mails only those who received the correct prediction, sending half a bullish prediction and half a bearish one. He continues this process month after month. Assuming the market goes up and down each month with a 50% probability, by the end of June there are 156 people who are simply amazed with the accuracy of the predictions in the anonymous letters. If you can get half of them to invest $50K, then you've just conned your way into nearly $8M.

Much to my surprise and pleasure he covers the very real increasing returns that can result from chance luck. While traders will, in my opinion, rarely see these benefits, some types of financial firms might. For example, assume 10 equally skilled venture capital partnerships all start at the same time. After 10 years, assume a distribution of results that varies from 15% to 30% annual returns. Further yet, assume that distribution is pure chance -- i.e., that each firm really was equally skilled. What happens next?

The top firm then gains a reputation for being the top firm. It therefore sees more business plans than the other firms, thus providing it with the pick of the business plan litter. What's more, due to its higher skill, it's able to offer lower valuations than competitors, competing literally on the greenness of its money (and not the quantity of it) and the reputation of the firm. Simply put, a chance advantage has become very real.

I'm not saying this is true in venture capital. I know plenty of venture capitalists and they certainly appear to me to be of fairly varied skills. And I know whose money I'd rather take and at what valuation. But am I fooled by randomness along with everybody else? Who knows. But it's a fun and interesting question.

For those interested in a summary, the central argument of the book is available here in this Forbes article by the author.

Thursday, September 06, 2007

New Blog to Check Out: The Database Column

On September 4th, a new blog launched, called The Database Column, a multi-author blog on database management systems (DBMSs) written by several luminaries in the field, including Michael Stonebraker, Dave Dewitt, Jerry Held, and Don Haderle. The opening post discusses Stonebraker's One Size Fits All papers, which I have already blogged about, here.

The catch? It appears to be sponsored by Vertica (at least that's what the copyright notice says at the bottom), Stonebraker's latest venture, a column-oriented database company. Let's hope they keep the blog topical and don't turn it into a Vertica commercial. That might be hard since the blog is basically authored by Vertica advisors (Cherniak, Dewitt, Madden, Zdonik) and directors (Held, Stonebraker), with the exception of Don Haderle who appears to be independent.

Nevertheless, given the brainpower of the authors, I've subscribed to it (feed here) and, until proven otherwise, I'll assume that they can author an interesting, relatively non-commercial blog on database management topics. I hope they do!

Wednesday, September 05, 2007

Mark Logic Partners with Temis

Mark Logic has announced its partnership with Temis, a leading provider of text analytics solutions, whereby the companies will pursue joint go-to-market activities in the publishing, government, life sciences, and other industries in order to help customers build richer, more powerful, XML-based applications.

The first activity under this partnership is a joint hosted reception and dinner at this week's SIIA Global Summit in Berlin.

I am a big believer in text mining, and specifically the use of text mining technologies to output enriched XML documents. Why? Because I see a natural synergy between text mining technologies that use advanced technology to determine meaning (and then tag it) and MarkLogic which provides lightning fast XQuery evaluation against large contentbases full of richly tagged XML. The two technologies go together peanut butter and chocolate. Or, for a French example, like rosemary and rack of lamb.

I like Temis a lot; we've worked with them already in several accounts. They have strong core technology and some great tools. Temis has a strong presence in life sciences and a growing one in publishing, utilities, and automotive.

Finally, Temis is run by an old friend, Eric Brégand, who ran product design when I joined Business Objects in 1995 and who later went on to to run the entire products team. I'm glad to be working with him again.

Revision:
The original copy of this post said that Temis was based on later version of the same core technology as Inxight. While, to my knowledge, both companies have roots in Xerox technology, a friend of mine who's more informed than I am shared the following. I'm not sure this is 100% correct either (there seems to be some market confusion about this issue), but here's what he said:
Temis uses software written by the Xerox XRCE team in Grenoble called Xelda. Xelda was basically development workbench tools, created in Grenoble [...] The Inxight runtime code was authored by Xerox PARC [...] So, they are not a "later version" of "the same code," but rather different codebases altogether.

Sunday, September 02, 2007

Intelligence 2.0

Today's New York Times had an article entitled, Logged In and Sharing Gossip, Er, Intelligence, that I think is well worth reading. The article describes Web 2.0 style initiatives in the Intelligence Community (IC) designed to improve the quality of intelligence and promote information sharing.

Excerpts:

In December, officials say, the agencies will introduce A-Space, a top-secret variant of the social networking Web sites MySpace and Facebook. The “A” stands for “analyst,” and where Facebook users swap snapshots, homework tips and gossip, intelligence analysts will be able to compare notes on satellite photos of North Korean nuclear sites, Iraqi insurgents and Chinese missiles.

A-Space will join Intellipedia, the spooks’ Wikipedia, where intelligence officers from all 16 American spy agencies pool their knowledge. Sixteen months after its creation, officials say, the top-secret version of Intellipedia has 29,255 articles, with an average of 114 new articles and more than 4,800 edits to articles added each workday.

[...]

“We see the Internet passing us in the fast lane,” said Mike Wertheimer, of the office of the Director of National Intelligence, who is overseeing the introduction of A-Space. “We’re playing a little catch-up.”

Personally, I'm glad to see the government using Web 2.0 style initiatives to try and improve the intelligence process, and I think the article (e.g., the headline) is overly negative in tone. I do understand that for virtually all technology changes, that it's not about the technology alone -- it's about people (culture), process, and technology together. I wouldn't expect things to be any different in the Intelligence Community than in finance or pharma, in that regard. People are people; organizational behavior is organizational behavior.

While I've never worked in the IC, it interests me for both personal and professional reasons. See this post, entitled Open Secrets, on what's called open source intelligence (OSINT) and is based on a delightful article by Malcom Gladwell, or read this book, The Puzzle Palace, a classic that describes the history of the National Security Agency (NSA).

Saturday, September 01, 2007

Parsing The Unparseable: Miss Teen South Carolina's Answer

The folks over at Powerset, a much hyped natural language search engine, have a fun post on their blog where they point their parser at Miss Teen South Carolina's now infamous answer to the question: "Recent polls have shown that a fifth of Americans can't locate the US on a map. Why do you think this is?"

First, her answer for those who may have missed it.

I personally believe that U.S. Americans are unable to do so because uh some uh people out there in our nation don't have maps and uh I believe that our ed- education like such as in South Africa and uh the- the Iraq everywhere like such as and I believe that they should uh our education over here in the U.S. should help the U.S. or- or- should help South Africa and should help the Iraq and the Asian countries so we will be able to build up our future.


Now, the parse tree (full-size image here):