Tuesday, April 17, 2007

The CMS of Reference

One of the nice things about living in content-land but coming from data-land is that I can spot interesting parallels between the two worlds.

For example, in data-land one of the big, seemingly simple, questions was always: how do we keep a customer's contact information (e.g., name, address, phone, email) correct across multiple redundant systems, such as finance, technical support, and marketing?

Should we:
  • Allow updates in only one system and propagate those each night to the other systems?
  • Keep the information in one system and pull it into the others in real time?
  • Keep redundant synchronized copies in all systems and replicate changes in real time?
  • Keep a copy-of-reference in none of the relevant operational systems, but instead in a separate system (e.g., the data warehouse) and pull it from there to all the operational systems?
Obviously, there are trade-offs involved in all such approaches. For example, the first approach means a call-center operator would need login credentials to the marketing system to make updates (if it were the system of reference) and there would be a lag in getting updates across all systems. The third approach generates a lot of network traffic and the possibility of conflicting updates hitting from different systems. The last creates yet-another system to serve as the master rather than simply picking one of them.

In my experience, pragmatists picked the first approach and visionaries picked either approach 3 or 4.

Carry this dilemma over to content-land and you have a very similar situation. Instead of "where's my copy of reference" for data, it's "where's my copy of reference" for content?

Historically, it was clear that print ruled the publishing world and that production processes were designed for print and everything else was secondary or derivative. For example, many newspapers produce their online product by scanning and converting a PDF of the print product. That is, so many changes are made downstream of the editorial process that the only way to get the final version of an article is to scan it in off the paper.

Lisa Bos at Really Strategies recently wrote an interesting post on this topic that you can find here. Lisa talks about print CMS, web CMS, and single-source CMS as the three primary options. That maps in data-land to roughly make the SFA the copy of reference, make the call center system the copy of reference, or make the data warehouse the copy of reference.

My belief going forward is that pragmatic publishers will be "web first" -- i.e., the system of reference will be web delivery and other delivery channels will be secondary. This is still a massive mind-shift for many publishers, particularly in layout-intensive segments like newspapers, magazines, and textbooks. MarkLogic can play an important role here because even with a "web first" mentality, I think there is a strong argument to delivering content to the website from an XML content server, instead of an HTML-based web CMS.

Visionaries will make a "content warehouse" in an XML repository (managed by an XML content server like MarkLogic) and do single-source delivery from that repository.

I'll try to post more on the "why build your content-driven website directly in XQuery on MarkLogic" topic later. Meantime, check out Lisa's post.

0 comments: