Friday, May 25, 2007

The Web 2.0 Opportunity for Publishers


Just a quick post to highlight a popular on-demand webinar that we've been running, entitled "Web Publishing 2.0: Ten Trends Publishers Need to Know."

The webinar features "rock star" Jason Hunter, principal technologist at Mark Logic, discussing several important patterns in how publishers are applying web 2.0 principles to transform their business.

And if there's someone who should know web 2.0, it's Jason. Among other projects, he's worked on O'Reilly Media's SafariU, where he lived for a time at the epicenter of the web 2.0 phenomenon.

A full description of the webinar program is here. To watch it live right now, press here. You won't be disappointed.

Thursday, May 24, 2007

Finally! A DITA White Paper Worth Reading

A white paper that we co-sponsored, authored by Flatirons Solutions CTO Eric Severson, won high praise this week from the inimitable Content Wrangler. See this post, entitled "Finally! A DITA White Paper Worth Reading: Dynamic Content Delivery Using DITA."

I'll pull one excerpt from the Wrangler's post:
We really like this whitepaper because it provides a clear picture of how adopting the topic-based content standard can help your organization see the possibilities a standardized, topic-based approach to content creation can provide.
I should note that this dynamic DITA delivery approach has won praise from other DITA gurus as well. IBM's Michael Priestly (who I think of as the "high priestly" of DITA), had this to say about Eric's presentation of the approach at a recent conference:
Flatirons Solutions and Mark Logic had a dynamite presentation on dynamic publishing - the full whitepaper is available on their websites, but the gist was using an XML server with partially preprocessed DITA content (flatten conrefs and dependencies, maintain conditional processing and metadata) to allow on-the-fly personalization of deliverables for individual customers.

I thought it was a very elegant solution to the traditional clash between author-oriented systems that privilege efficiency of reuse vs. delivery-oriented systems that privilige redundancy of content for the sake of performance: use one format in both places so you don't lose the semantics, perform as much preprocessing as you can without generating multiple versions of the content, and let a dedicated XML server handle the final stage in real time.
For more of Michael's thoughts, see here.

Zig When The Other Guy Zags: A Crappy Search Engine.com

I'm a big believer in, as my old friend Ed Horst so eloquently puts it, zigging when the other guy zags.

When consumers first became sensitive to vehicle size and gas consumption, Cadillac created the oxymoronical Cimarron, the small Cadillac. (Who wants a small Cadillac? Answer: no one.) Cadillac zigged, and Lincoln zagged: at the same time, Lincoln made the towncar bigger, doubling down on the segment who wanted big, black, shiny luxury cars.

When consumers were worried about sugar and caffeine in their soft drinks, Coca Cola was busy decaffeinating and de-sugarizing. Then along came Jolt, with tagline "all the sugar and twice the caffeine," clearly positioning themselves as something different, and laying the groundwork for today's massive energy drink category.

So why I am writing about ACrappySearchEngine.com?

  • First, I always love contrarian positionings. Instead of the the endless "my PhD's smarter than yours" (and therefore my algorithm's better than yours) message on which Ask.com is spending $100M in a new ad campaign, the crappy guys are arguing that some junk mixed in search results (e.g., finding Curious George when looking for George Bush) is actually the right answer.

  • Second, I think there's more than a grain of truth to their argument. How do we add the serendipity of a newspaper in an relevance-driven, alerts-driven web world? (See this post for more that theme.)

So while I like the idea of an Internet search engine that mixes things up a bit in the results, I won't be making ACrappySearchEngine my default. Why? Because while the concept is fun, the implementation is amateurish. Perhaps a "crappy" algorithm with a nice UI and some iterative refinement would be nice. But as-is, the site's not there yet in my opinion.

Here is Valleywag's take. Check out their about page for more.

Wednesday, May 23, 2007

Enterprise Search Crisis

Check out this blog post on ZDnet, entitled "Enterprise Search: Why it's a Crisis and Googzilla will Strike," which is a series of takeaways from Stephen Arnold's recent presentation at Enterprise Search Summit in New York.

The parts I agree with are:
  • It's a crisis. I continue to believe there is generally low customer satisfaction with enterprise search and it seems to come from a combination of expectations management and post-sale delivery. As one enterprise search alumnus I know says: "At [enterprise search vendor X], we sold a Ferrari. However, we just dumped the pieces on your driveway and you had to assemble it."
  • Over-positioning enterprise search as a silver bullet. I see a lot of this. Enterprise search vendors claim today that they can do everything from finding documents (the original purpose) to detecting money laundering to BI reporting to merchandising so you can sell more polo shirts to data warehousing to legal compliance and beyond. One vendor pitches 30 different solutions, each as a silver bullet, I'd suppose.
  • Complexity and cost. Enterprise search vendors charge a lot of money for their wares and they are certainly complex to configure, use -- and in some cases -- understand. (Think Bayesian inferencing.)
  • That all this creates a great opportunity for Google to step in and sweep up some serious market share.
However, I think I have different take at the macro level. Simply put, I think enterprise search is stuck between a rock and a hard place.
  • The rock is database management systems. Many search solutions are integrations of relational databases with search engines along with templates for specific applications. While many search vendors are trying to reposition their products as application platforms, they're not. Tying together MySQL, search engine X, and some pre-processing logic so you can properly feed the search engine indexer is not a great "platform" on which to build applications. Databases are much better application platforms and the real problem has been that databases, until recently, didn't do content. But as new generations of database management systems -- like MarkLogic -- emerge, it will become increasingly clear that the platform for content applications should not be an enterprise search engine (bolted to other things), but instead a database management system built to natively handle content.
  • The hard place is the Google Appliance. There will always be a need for "Google inside your company" type search. I call this the "crawl and index" value proposition. Given cost and complexity, I can't see why Google won't sweep up most of the market here. (I just wish they could do better with PDFs and email.)
You can find the full text of Stephen's speech here.

Tuesday, May 22, 2007

Upcoming: X-Pubs and Tools of Change

Just a quick post to highlight two exciting conferences of interest to publishers and those involved with publishing, XML, and/or content delivery.

X-Pubs 2007, June 4th - 5th at the Royal Berkshire Conference Center (outside London), the home ground of the Reading Football Club (i.e., soccer team). Mark Logic's own Stephen Buxton will give a presentation entitled: Content Applications in the Wild. I plan to swing by this conference as well. The agenda is loaded with presentations on topics including DITA, XML publishing, S1000D, XML content management, single source publishing, and technical content delivery. I attended this show last year and enjoyed it.

The O'Reilly Tools of Change for Publishers conference June 18th - 20th in San Jose, California. It has a star-studded agenda including Tim O'Reilly, Chris Andersen (Wired and The Long Tail ), Jimmy Wales (Wikipedia), Bruce Chizen (Adobe), and John Ingram (Ingram Book Group).

And that's not to mention Mark Logic's Jason Hunter who'll be speaking on Next-Generation Web Publishing.

Mark Logic is a gold sponsor of Tools of Change, and I'll be attending on Tuesday. They have tutorials on Monday, and I'd highly recommend Eric Severson's XML: What Publishers Need to Know. Eric is the CTO of Flatirons Solutions and an great speaker on XML publishing.

Quick Take: Business Objects Acquires Inxight

I could write a long post on this topic, and perhaps will one day, but since time is of the essence in the blogosphere, I thought I'd do my "quick take" on my former employer's acquisition of Inxight, press release here.
  • I'd first heard rumors of Business Objects as an Inxight suitor many months ago. So long in fact, that I assumed they'd looked at the deal and walked away. Perhaps that happened. Perhaps they came back. I don't know. But from either the selling or buying perspective, the deal is not a surprise.
  • Business Objects (BOBJ) is a value shopper so I'm sure that the deal makes sense financially. (That the terms were undisclosed is, in my opinion, another clue to that effect.)
  • They say in the press release that Inxight's revenues were $25M. I'm guessing that Business Objects paid between $50M and $75M.
  • There is an interesting arbitrage in enterprise software multiples right now. According to this Software Equity Group report (see figure 6), the average enterprise value (EV) to trailing twelve month (TTM) revenue ratio is 3.5x for companies greater than $1B in revenue and 1.8x for companies less than $100M in revenue. That means that $1B+ players can buy sub-$100M players "for free" on the theory that they can pay 1.8x for a company's revenue and have it instantly worth 3.5x in their own market cap.
  • I think the deal is good for BOBJ because it allows them to score vision points in an important area -- unstructured data. It's also good for Business Objects to be doing offensive acquisitions as well as defensive ones (e.g., Cartesis).
  • I think the deal is good for Mark Logic because it continues to put unstructured data (i.e., content) on the mainstream IT roadmap. Simply put, anything that gets people to pay more attention to content and have more desire to do interesting things with it is good for Mark Logic.
One thing you'll hear more from Mark Logic about is the distinction between metadata extraction and content enrichment. Right now, Inxight and other text mining vendors focus primarily on extracting metadata from content -- discovering that document 17 talks about the cities "New York" and "Paris" and the person "Pope Benedict XVI". But the reason these vendors extract that metadata is because they want to play nicely in the existing ecosystem of relational databases and BI tools. Simply put, if the ecosystem does data, then you should use your tool to turn content into data, so you can load it into Oracle and make reports on it in BusinessObjects.

That approach makes sense if you look at things from the data end of the telescope. However, if you flip the scope around and say what's better -- extracting the metadata from the documents and loading it into Oracle or enriching the documents themselves by in-lining the newly discovered facts as markup? In a pre-MarkLogic world, that question didn't make sense because no tools existed that could let you do anything with that enriched markup. But clearly there is a lot of information loss (specifically structure and location) that happens when you extract metadata from documents instead of enriching documents themselves.

For example, if extracted, you can run queries like "tell me which documents talk about 'New York' and 'The Pope'" whereas, if in-lined, you can run queries like "return the abstract and bibliography of all articles that talk about 'The Pope' in a heading and mention 'New York City' within 15 words in the following paragraph." That's a huge difference in power.

That difference is why I like this deal. Because when you think you can't do anything with content you don't thirst for anything, but once you get a drink, you thirst for more. So when Bobxight starts to give people a taste of what you can do with analytics on content, I think it will, for many, beg questions that will make them want to move to a database infrastructure designed for content, not data -- i.e., an XML content server.

I should note that Mark Logic is both an Inxight customer (we OEM their language processing technology) and partner (our technologies are quite complementary and we work together in several accounts) and we look forward to continuing those relationships.

Otherwise, all I'd say is that if Business Objects wanted to join the Mark Logic partner program, all they had to do was ask.

See this post on Timo Elliott's BI Questions blog for a nice overview the Inxight products. See here for Curt Monash's take on the deal.

Friday, May 18, 2007

CMS Watch on EMC Departures

Just a quick post to link to this recent CMS Watch story by Alan Pelz-Sharpe discussing recent departures from EMC | Documentum. Of late, the company has lost:
  • Dave DeWalt, former CEO, to McAfee
  • Howard Shao, Documentum co-founder and CTO, to what I think is retirement
  • Rob Tarkoff, former EVP and CSO (chief strategy officer) to Adobe
Software acquisitions are always tricky, and they're made trickier when you're uniting opposing cultures. To me, EMC and Documentum always seemed pretty much dead opposites from the get-go:
  • East Coast vs. West Coast
  • Hardware vs. software
  • Blue vs. white collar
The last point is more metaphorical than literal -- and it's not easy to articulate -- but in my opinion there is a real background difference between the companies that's hard to put your finger on, but that's not fully captured in either the East vs. West or hardware vs. software distinctions.

That said, VMware was as Berkeley/Stanford as they come, and it seems to have done just fine under the EMC umbrella since its January 2004 acquisition for $625M. Ironically, from the EMC perspective, perhaps that's because they left it alone more under the impressive Diane Greene, as opposed to trying to make it core and combine it with other software divisions. As you've probably also heard, EMC is in the midst of spinning off a piece of VMware to the public via a planned $100M IPO, which has raised some eyebrows due to its voting and control provisions.

To me, it's tricky times for the Documentum business because they need to deal with:
  • The continuing ramifications of a being part of EMC
  • The entry of mega-players IBM (via Filenet) and Oracle (via Stellent) into the ECM space
  • The continuing SharePoint market erosion thanks to Microsoft.
  • The entry of new open-source alternatives, like Al Fresco, created by the other Documentum co-founder, John Newton.
  • The uncomfortable reality of "being attacked from below" by IBM, Oracle, and Microsoft, each of whom who has a strong database business underlying their ECM. This uncomfortable reality is a bit like SAP's, who for years has a sought a way to not rely on Oracle as an underlying DBMS.

Thursday, May 17, 2007

A Periodic Table of Visualization Techniques

Someone inside Mark Logic recently forwarded me a link to this Periodic Table of Visualization Methods and I have to say it's pretty cool.

Be patient, as it takes a while to load. Note that when you place your cursor on top of a cell it creates a pop-up window showing an example of the relevant technique.

Wednesday, May 16, 2007

How Big is Big? A 200TB contentbase?

Many customers have large MarkLogic contentbases, because scaling is something we do quite well. I thought I'd share a bit of what one presenter described today at the user conference.

One speaker today described a project with:
  • 2 petabytes of total storage. Get ready, because that's 15 zeros. 2,000,000,000,000,000 bytes of data, if I got the zeros right. (For more on how big a petabyte is, see here.)
  • 200 terabytes of content that will go into MarkLogic, requiring about 482 TB of total disk space, including indexes. That's nearly 25 times the text content of the Library of Congress, per this post. Were it deployed today, it would be the 3rd largest database in the world, per the same post.
  • Approximately 1,200 terabytes of associated binary data that will be stored on the file system.
  • 4B documents going to 7B documents over time.
It's a good thing that Christopher Lindblad was thinking "Internet scale" in terms of target contentbases when he designed the system. There are customers who have projects that require it.

Notes from Tim O'Reilly Keynote Address

Tim O'Reilly gave a fascinating, information-loaded, 105-slide keynote address this morning at the 2007 Mark Logic User Conference. Tidbits include:
  • O'Reilly's mission is to change the world by spreading the knowledge of innovators: watch the alpha geeks.
  • Web 2.0, which could be called Publishing 2.0, is about information businesses: it's a data revolution
  • What did the survivors of the dot-com bust have in common? They all used the network as a platform
  • User-generated content (UGC) and harnessing collective intelligence aren't the same thing. UGC is one way of harnessing collective intelligence, but there are others as well. For example, every time a webmaster makes links to a site they are telling Google the site they're linking to is important.
  • Harnessing collective intelligence is about growing a database whose value grows with the number of participants.
  • Data is the next Intel Inside. (Or, as we prefer to say at Mark Logic: content is the next Intel Inside.)
  • The top placed ad on Google isn't based on solely on the highest bid: it's based on the highest bid times the expected click-through rate. That both serves the user and makes Google more money
  • You should include network effects by default. On Flickr, the default is to share.
  • Human collaboration beats the machine/algorithm: consider Last.fm's success relative to Pandora, who created a "music genome project" to try and dig into music and determine what you'll like.
  • Everyone should read Kathy Sierra's Creating Passionate Users blog.
  • Lessons from Google Maps: if your users are not surprising you with what they're doing, then you're not open enough. If they are, then try to learn from them. Half of all mashups leverage Google Maps. (See programmableweb for a mashup directory.)
  • We see content as a database and web services as a platform
  • Remember this quote from Ray Kurzweil: "an invention needs to makes sense in the world in which it's finished, not the world in which it's started."
Tim also mention their upcoming conference, Tools of Change for Publishing, which is in San Jose from 6/18-20. (Mark Logic user conference attendees were given a discount code worth ~20% off.) Others should know that you can save $200 by registering prior to the early bird deadline on 5/21.

Tuesday, May 15, 2007

Mark Logic Unveils Release 3.2

We introduced MarkLogic 3.2 today at the 2007 user conference in San Francisco. Here's the press release.

Key new features include:
  • Expanded content processing capabilities
  • Improved content search features
  • New content analysis capabilities
  • Increased developer support
  • An interface to the oXygen XML editor
You can find more information here.

User Conference Day One Highlights

We had about 300 people in attendance for my opening keynote address at the 2007 Mark Logic User Conference in San Francisco today. My speech was about Mark Logic's vision and strategy. I talked about:
  • Our overall mission to Unlock Content™
  • How relational databases have repeatedly failed to handle content
  • How a fresh look is needed -- a system that is natively designed to handle content
  • What makes MarkLogic such a special XQuery implementation
  • The parallel between the query inflexibility that customers face today when using content with relational databases and search engines ... and the inflexibility of data query in pre-relational databases
  • How the future will bring two markets on top of XML content servers: a content applications market and a content analytics market
  • The impact of Office 2007 XML and the WinFS (which is now dead) fumble
  • The rise of special-purpose DBMSs
Here's my presentation:



While we have dual tracks (technology, applications) at this year's conference, I only attended the applications track. Here is a paragraph on each of today's applications track sessions.

Nerac gave an interesting presentation on the application that they built that lets their professional researchers do very powerful searches across their very broad contentbases, without requiring them to know XQuery and then assemble the results into a dynamically published custom reports. (They have many advanced search screens as well as an intermediate search language that presumably all gets translated into XQuery.) This enables them to deliver, quite compellingly, on their value proposition of “research, not search.” This has enabled them to increase the value of their products, in cases, by over 10x.

McGraw-Hill Education gave an interesting talk on their digital asset library (DAL) project and their vision of custom educational publishing to meet different state educational standards and to refer and/or produce content specific to individual students based on their individual needs – given, for example, how a student scored on a standardized test (known as “assessment” in the educational publishing world). The DAL helps in many use-cases: create a new product, develop a new edition, build a media product, assemble a custom project, provide global access and distribution.

University of Toronto Libraries gave an interesting presentation where the speaker first provided an overview of a 2005 survey by OCLC on perceptions of libraries in the digital age. The speaker then went on to demo a great app, evidently built quite quickly, that contained just about every web 2.0 feature I could think of that a student would want in a library search portal (e.g., facets, dynamic tagclouds, guided discovery with cited and citing documents, content enrichment via web services, reviews, comments, ratings) – that will ultimately propel them into the era of Library 2.0. As the speaker said, the era of teaching students to write ever-more-complex queries in order to restrict results is over. From here on out, it’s about simple search and subsequent refinement.

CQ presented on a new alert processing framework that they’ve built. Providing alerts is a big and important part of their business. As they say, most people want to do a search that pulls from a set of content; at CQ, we do the inverse – we save queries and then run them against all new content and deliver alerts as the results. They also talked more broadly about their general mission to put "information in actionable context" and discussed the idea that many publishers make money not only with their own proprietary content, but also by adding value through integrating and contextualizing public domain content.

Really Strategies talked about how the MarkLogic classifier and its [unique] XML-based classification (that understands both words and XML structure) can be leveraged to improve the productivity of CMS users, for example, to find source content of interest and/or repurposing. Towards that end, they have incorporated such classification into their MarkLogic-based RSuite CMS (their CMS product targeted at publishers). They also discussed the results of an experiment they did classifying crime-related content with RSuite, which showed both how well the classifier works as well as the time it can save by automating the process of creating metadata for XML documents. (And then you can start mining / querying the contentbase and enriched metadata.) Cool.

IEEE talked about their new vision for Research 2.0 and how they are trying to change paradigms in how content is used by delivering answers to questions and follow an agile publishing process. Interestingly, IEEE launched a research program sometime ago to understand how content will be used in the future, including a study on how young engineers in California and India use research. They discussed IEEE Articles in Context, a beta project that focuses on XML, deconstructing, mashing up, and reconstructing content, with the ultimate goal of figuring out how to integrate content directly into the workflow of engineers (another example of content in context). The speaker also discussed some of the unique problems involved in searching mathematical equations, which are understandably quite popular in their content and often pages in length. They're in the midst of testing the prototype with young technologists, gathering their feedback, and using that data to drive future publishing offerings.

Mark Logic founder Christopher Lindblad closed out the day with a great interview conducted by Denise Miura, Mark Logic's director of technical services (who also, by the way, put together the entire conference and did a superb job in so doing).

Nuggets from Christopher:
  • I'm a search guy. Dave, from his speech, is pretty clearly a database guy. [We like mixing the two at Mark Logic.]
  • When, at my previous job, I realized that someone would pay $1M for a search engine that could run database-like queries, I said "note to self."
  • Don't be too abstract in your XML -- in some standards, every element is called "element" and use an attribute called "name."
  • I'm excited that Microsoft has adopted XML as the format for Microsoft Office. That will cause the amount of people creating XML in the world to dramatically increase.
  • With [XML and] wireless technology, I could imagine a lot more location-aware activities. I could see geo-location as an exciting area.
  • I'm a nerd about hardware; this is the one question that will get me excited (in response to what's the best hardware for MarkLogic.)
  • You always hear there are no 64-bit apps; there *is* one: MarkLogic.
  • You get your impression about what a computer should be when you graduate from college. When I graduated, a megabyte was a lot of memory and 30 megabytes was a lot of disk. You need to remember to look at cost and not react just to the absolute numbers.
  • Computers have three commodity resources: CPU, memory, and disk. The way we tune the product tends to assume you evenly divide your hardware dollars across the three areas. Stop thinking about gigahertz, megabytes ... and think about money.
  • Make a database that works like a search engine ... or make a search engine that works like a database; it's kind of a Rees' peanut butter cup thing.
  • You can't make a relational database with 10,000 columns ... well, you could but a bear can ride a bicycle and not work. Thanks to XML, you can make record-oriented applications
  • How do I say this without being arrogant ... we're better -- in response to what's the difference between Mark Logic and competitors. We make a premium product for a premium price. If you succeed, we succeed.
Finally, I'd like to say a big "thank you" to our conference sponsors:

Monday, May 14, 2007

Change is Good: You Go First

This post’s title is one of my favorite sayings because it perfectly captures our conflicting attitudes toward change. Intellectually, people know that change is necessary for advancement, but emotionally, most of us still don’t like it.

Happily, for companies like Mark Logic, there are always some brave souls willing to try changing the way they do things. Sometimes these people are driven to change by external forces (e.g., publishers who know that objects like Google in the rear-view mirror are indeed closer than they appear). Sometimes, they’re just adventurous spirits working in groups dedicated to technology exploration. Sometimes, they’re open to change simply because the mission is too important not to be (e.g., preventing terrorism).

The idea for this post came to me during a recent sales call. We were visiting a publisher which was looking to replace its search engine because it was expensive, hard to configure, and under-performing expectations. Moreover, the supplier was discontinuing support of the product, forcing a potential upgrade.

The good news was that these folks had found Mark Logic and were willing to hear what we had to say. But I was worried they were “wedged” in a search paradigm. As I said on the call:

If you’re just looking to replace your search engine the way you might change the oil filter in a car, then you should just go do that; there are plenty of them out there. If, however, you’re looking to change the way you build information products, to add enormous agility to that process, and to save the expense of buying and integrating a search engine and a DBMS to boot, then you should consider Mark Logic.

Look. The paradigm defines the outcome. If you spec a vehicle as requiring wooden wheels, a spring-loaded bench, leather reins, a hand brake, and a low hay/mile consumption rate, then you are never, ever going to come up with a car.

The fact is that disruptive technologies almost never have every feature of those they replace, especially at first. (Recall that it took about a decade for the relational DBMS to become production OLTP worthy.)

So if you want to stare at Mark Logic through an enterprise search engine lens, happily you will find that it has a lot of things that search engines don’t (e.g., read/write, transactions, database-style query language). But you’ll also find it’s missing a few things that search engines do have (e.g., a recent, now-neutralized example of this is proximity search – see aside below).

But that’s not the point. If you remove the search engine lens and frame the question not as “do you have reins and a hand brake” but instead as “what’s the best vehicle to get from A to B” -- i.e., “what’s the best platform on which to build new information products” -- then you’ll find the answer is most certainly Mark Logic and I can find about 30 happy publishers who’ll confirm that.

Time will tell where this customer ends up. They were great people, and we had a great meeting, so I hope they’ll choose to work with us. Either way, I feel for them since it’s never easy facing these sorts of challenges.

Like the headline says: change is good, but you go first.

Aside on Proximity Search

Proximity search is the name of a feature that lets you find all documents where word-A is within N words of word-B. It’s a popular search engine feature and until version 3, something that MarkLogic lacked. I like to talk about proximity because it provides a fascinating example related to disruptive change.

From a purist XML content server perspective, proximity search is a hack, a workaround to a problem that enterprise search engines face.

For example, if you want to find all contracts governed by Texas law, you could use your enterprise search engine to do a simple keyword search on “Texas” and “governing.” But say your company’s in Texas, so every contract has Texas in numerous address blocks. And every contract presumably also has a governing law section. So your query will return literally every contract in your database. Not so useful.

Proximity search addresses this problem by letting you say: find all documents where “governing” is within 10 words of “Texas”. It’s not a bad fix, if you’re enterprise search vendor.

But an XML person sees this problem differently: XML has structure, so use it. The search becomes: find all documents with a section-heading element that contains “governing” and that contain “Texas” in the first paragraph of the subsequent section. You don’t need proximity to answer this question in an XML content server.

So think about this: we get asked to add a feature in our product that was added to one of the technologies we’re replacing in order to fix a limitation in what they had. Wow. It’s a bit like asking for blinders for your car’s headlights.

But we did it. Why? Because proximity’s still useful in an XML content server, because XML-aware proximity is even cooler (find these elements near those elements), and because it’s about 10x easier to tell this story when our product contains proximity then when it doesn’t. Interesting, n’est-ce pas?

Friday, May 11, 2007

Web Applications: The Virtues of Top-to-Bottom XML

I think that most people now correctly perceive our product, MarkLogic Server, as an XML content server, a special-purpose DBMS designed specifically for handling XML marked-up content. That’s the good news.

The better news is that many of these same people are figuring out what that means when it comes to developing web applications – specifically, that you can use an XML content server to build web applications using XML top-to-bottom. No Java required. No relational tables required. No application server required. (And no expense for all those supporting products.)

Don’t get me wrong. Many customers choose to use MarkLogic as the XML repository and query system in their architecture, building their applications in Java, using an application server, and making calls out to MarkLogic to process XML queries. Lots of people use the product in that way. That’s fine.

But, people soon realize, when you have a DBMS and query language (XQuery) that directly outputs XML (e.g., xHTML) which can be directly rendered by a browser, and when that “query” language is really a misnamed and underpositioned programming language easily capable of developing entire applications, you can say:
“Wait a minute. My content’s in XML. My browser speaks XML. Why not build my whole app top-to-bottom in XML and XQuery?”

Good question. And the answer is you can. And in many cases, you probably should. What’s the advantage of so doing?

  • Use of a high-level, standard, powerful programming language, XQuery. High-level and powerful translate to greater development and maintenance productivity. Standard translates to risk reduction and freedom of choice. (Aside: While XQuery is not a big-hype, overnight-success type of technology like Ajax, XQuery continues to march along with certain inevitability. In my mind, there is no question that XQuery will be the database programming language of the future – it is superior to SQL, it is more general than SQL and ergo applicable to a broader class of problems, and all major DBMS vendors are already committed to it. The question is not will XQuery become mainstream, but when?)
  • Elimination of three impedance mismatches: Java/XML, XML/relational, and Java/relational. Java is object-oriented, XML is hierarchical, and relational databases are tabular. The mapping between these three different data models generates a lot of zero-value-added work in developing an application. When you’re XML top-to-bottom, poof, that work’s all gone.
  • Elimination of tiers. I had lunch a while back with a top engineer at Oracle who told me that he believed the limiting factor on database application performance was becoming scheduling. That is, hardware and databases are becoming so fast that scheduling work across tiers was becoming the limiting factor in performance. His suggested solution? Eliminate tiers. Well top-to-bottom XML does exactly that.

Thursday, May 10, 2007

Mark Logic Opens UK Subsidiary

I'm pleased to announce that Mark Logic has created Mark Logic (UK) Ltd.

We continue to see strong demand for our XML server in publishing, government, and commercial markets and I'm quite pleased that we have officially put our feet down in the UK.

While many of our customers are already outside the US, thus far, we have primarily served them by working with/through their US subsidiaries. Henceforth, we will be able to directly serve the UK market from this office.

Here's the contact information:

Mark Logic UK Limited
3000 Hillswood Drive,
Hillswood Business Park,
Chertsey, Surrey, KT16 0RS
United Kingdom
+44 (0) 1932 796 400 (phone)
+44 (0) 1932 796 414 (fax)

Friday, May 04, 2007

Blog Enhancements and Request to Use Feedburner Feed

Thanks to the gracious assistance of Jason Hunter, I've added some new bells and whistles to my blog template.
  • Subscription buttons for popular newsreaders in the right-hand column
  • Digg and del.icio.us buttons in the post footer template
If you have any problems using any of these features, please let me know.

In addition, please do me a favor, and take a minute to reset your connection to this blog to use the feedburner feed.

Why? Because I have learned that while sitemeter tracks visitors to the actual blog URL accurately, it does not track people who read the blog via its ~4 feeds (e.g., RSS, Atom). Since I believe the majority of my readership comes from such subscribers, and more importantly that these subscribers are my best blog customers, I want to know more about them. (Right now, I'm in the awkward position of knowing the most about those who visit the blog the least.)

To solve this, I'd like to switch to use feedburner to get statistics on subscription readers, but that requires me trying to move everyone to a single feed, this one here: (http://feeds.feedburner.com/marklogic).