Tuesday, May 15, 2007

User Conference Day One Highlights

We had about 300 people in attendance for my opening keynote address at the 2007 Mark Logic User Conference in San Francisco today. My speech was about Mark Logic's vision and strategy. I talked about:
  • Our overall mission to Unlock Content™
  • How relational databases have repeatedly failed to handle content
  • How a fresh look is needed -- a system that is natively designed to handle content
  • What makes MarkLogic such a special XQuery implementation
  • The parallel between the query inflexibility that customers face today when using content with relational databases and search engines ... and the inflexibility of data query in pre-relational databases
  • How the future will bring two markets on top of XML content servers: a content applications market and a content analytics market
  • The impact of Office 2007 XML and the WinFS (which is now dead) fumble
  • The rise of special-purpose DBMSs
Here's my presentation:



While we have dual tracks (technology, applications) at this year's conference, I only attended the applications track. Here is a paragraph on each of today's applications track sessions.

Nerac gave an interesting presentation on the application that they built that lets their professional researchers do very powerful searches across their very broad contentbases, without requiring them to know XQuery and then assemble the results into a dynamically published custom reports. (They have many advanced search screens as well as an intermediate search language that presumably all gets translated into XQuery.) This enables them to deliver, quite compellingly, on their value proposition of “research, not search.” This has enabled them to increase the value of their products, in cases, by over 10x.

McGraw-Hill Education gave an interesting talk on their digital asset library (DAL) project and their vision of custom educational publishing to meet different state educational standards and to refer and/or produce content specific to individual students based on their individual needs – given, for example, how a student scored on a standardized test (known as “assessment” in the educational publishing world). The DAL helps in many use-cases: create a new product, develop a new edition, build a media product, assemble a custom project, provide global access and distribution.

University of Toronto Libraries gave an interesting presentation where the speaker first provided an overview of a 2005 survey by OCLC on perceptions of libraries in the digital age. The speaker then went on to demo a great app, evidently built quite quickly, that contained just about every web 2.0 feature I could think of that a student would want in a library search portal (e.g., facets, dynamic tagclouds, guided discovery with cited and citing documents, content enrichment via web services, reviews, comments, ratings) – that will ultimately propel them into the era of Library 2.0. As the speaker said, the era of teaching students to write ever-more-complex queries in order to restrict results is over. From here on out, it’s about simple search and subsequent refinement.

CQ presented on a new alert processing framework that they’ve built. Providing alerts is a big and important part of their business. As they say, most people want to do a search that pulls from a set of content; at CQ, we do the inverse – we save queries and then run them against all new content and deliver alerts as the results. They also talked more broadly about their general mission to put "information in actionable context" and discussed the idea that many publishers make money not only with their own proprietary content, but also by adding value through integrating and contextualizing public domain content.

Really Strategies talked about how the MarkLogic classifier and its [unique] XML-based classification (that understands both words and XML structure) can be leveraged to improve the productivity of CMS users, for example, to find source content of interest and/or repurposing. Towards that end, they have incorporated such classification into their MarkLogic-based RSuite CMS (their CMS product targeted at publishers). They also discussed the results of an experiment they did classifying crime-related content with RSuite, which showed both how well the classifier works as well as the time it can save by automating the process of creating metadata for XML documents. (And then you can start mining / querying the contentbase and enriched metadata.) Cool.

IEEE talked about their new vision for Research 2.0 and how they are trying to change paradigms in how content is used by delivering answers to questions and follow an agile publishing process. Interestingly, IEEE launched a research program sometime ago to understand how content will be used in the future, including a study on how young engineers in California and India use research. They discussed IEEE Articles in Context, a beta project that focuses on XML, deconstructing, mashing up, and reconstructing content, with the ultimate goal of figuring out how to integrate content directly into the workflow of engineers (another example of content in context). The speaker also discussed some of the unique problems involved in searching mathematical equations, which are understandably quite popular in their content and often pages in length. They're in the midst of testing the prototype with young technologists, gathering their feedback, and using that data to drive future publishing offerings.

Mark Logic founder Christopher Lindblad closed out the day with a great interview conducted by Denise Miura, Mark Logic's director of technical services (who also, by the way, put together the entire conference and did a superb job in so doing).

Nuggets from Christopher:
  • I'm a search guy. Dave, from his speech, is pretty clearly a database guy. [We like mixing the two at Mark Logic.]
  • When, at my previous job, I realized that someone would pay $1M for a search engine that could run database-like queries, I said "note to self."
  • Don't be too abstract in your XML -- in some standards, every element is called "element" and use an attribute called "name."
  • I'm excited that Microsoft has adopted XML as the format for Microsoft Office. That will cause the amount of people creating XML in the world to dramatically increase.
  • With [XML and] wireless technology, I could imagine a lot more location-aware activities. I could see geo-location as an exciting area.
  • I'm a nerd about hardware; this is the one question that will get me excited (in response to what's the best hardware for MarkLogic.)
  • You always hear there are no 64-bit apps; there *is* one: MarkLogic.
  • You get your impression about what a computer should be when you graduate from college. When I graduated, a megabyte was a lot of memory and 30 megabytes was a lot of disk. You need to remember to look at cost and not react just to the absolute numbers.
  • Computers have three commodity resources: CPU, memory, and disk. The way we tune the product tends to assume you evenly divide your hardware dollars across the three areas. Stop thinking about gigahertz, megabytes ... and think about money.
  • Make a database that works like a search engine ... or make a search engine that works like a database; it's kind of a Rees' peanut butter cup thing.
  • You can't make a relational database with 10,000 columns ... well, you could but a bear can ride a bicycle and not work. Thanks to XML, you can make record-oriented applications
  • How do I say this without being arrogant ... we're better -- in response to what's the difference between Mark Logic and competitors. We make a premium product for a premium price. If you succeed, we succeed.
Finally, I'd like to say a big "thank you" to our conference sponsors:

0 comments: