First, let me give credit where it's due. IBM is setting the agenda on XML support in relational databases. I think they have both Oracle and Microsoft on the back foot, responding to the aggressive position they are taking. Moreover, I think they have done excellent marketing around the feature "native XML storage," now cutesily renamed "pureXML".
Before providing my analysis and the "columns in drag" argument, here are some links to the coverage:
Here is an Information Week article, where IBM claims "it's the biggest database advance in 20 years." Here's an InternetNews article that says IBM is "banking on the new features to help it gain market share versus Oracle." It also talks about venom (ten more cute points), a new compression utility that will ship with Viper.
Finally, here's an article in the UK's Computer Business Review which says, "IBM's rationale for making XML data a first class-citizen is due to the fact that XML data volumes are exploding."
Read those words carefully: IBM is making XML data a first-class citizen. XML data.
But what about XML content? Ah, it's again the stepchild. But that's what you'd expect a database vendor to do; take care of data like they always do. And that's precisely what they've done.
First, what has IBM implemented? Simply put, you can now create columns of type XML within a table in a relational database. You can decide whether that XML must conform to a given schema. By making schema-compliance optional, IBM has solved one of the problems with loading content into a RDBMS. (In the past, you either loaded XML as a BLOB/CLOB which for all practical purposes is a file, or you needed to "shred" XML columns into tables. While XML data is typically well structured and can shred easily, XML content is more loosely structured and it's a generally accepted fact that shredding does not work when it comes to content.)
So now with Viper you can put XML into a column of type XML in a database. But how do you get it back out? Is Viper as great as it sounds or is Viper fine for XML data, but a roach motel for XML content? (Where content checks in, but it doesn't check out)
I think it's the latter. Here's why. To get information out of an XML column you create a good, old-fashioned b-tree index on an XPath-like path called an xmlpattern. Here's the syntax:
create index on po-table(po-xml-column) generate key using xmlpattern '/po/po-header/po-number' as sql varchar(10);
But how can you get the flexibility of a system like MarkLogic when you need to know in advance which XML elements are worth indexing? How can you index the elements you want when the schemas are highly varying and you are not able to create an xmlpattern to identify them? Guess what? You can't.
One way to think of this is that Viper requires top-down indexing. Tell Viper everything you want to query in advance and then provide top-down paths to locate those elements inside the XML. By contrast, you can think of MarkLogic as providing (search-engine-like) bottom-up indexing: MarkLogic indexes every element, automatically, regardless of whether there is a regular way of finding it inside different documents.
I think the top-down approach works fine for XML data. I think you need bottom-up for XML content.
Here's another way to look at it. If the good news about Viper is you don't need to map XML elements to columns to store XML in a database, then the bad news is that you need to map XML elements to indexes to get them back out. Without the b-trees on the XML elements, it is effectively a roach motel.
This is why Mark Logic technology strategist John Pilat (formerly VP of engineering for server technologies at Oracle) calls this approach "columns in drag." No, you don't need to map XML to columns to put it in. But you do need to map XML to xmlpattern indexes to get it back out. Thus, the sum of those indexes becomes a de facto schema for the XML and you've ended up with ... columns in drag.
There is another access method for XML columns, via the text search capabilities that most RDBMSs already provide, with all the pros/cons of using text search against XML content. Examples:
- "Dave Kellogg" can be hard when XML is getting in the way of the text as in: [fname] Dave [/fname] [lname] Kellogg [/lname]. (Sorry I couldn't use angle brackets but I couldn't get escape characters to properly work.)
- Finding "retract the landing gear" can be hard with an XML footnote in the way a la "Retract [footnote] Ensure hydraulics are fully primed [/footnote] the landing gear."
- Text-index updating for the text in XML columns is not transaction consistent. That means when you commit a row to a table that contains author(varchar), publication-date(date), and article(XML) that the b-tree indexes on the first two columns are updated and the btree xmlpattern indexes on the XML column are updated, but the text indexes on the XML column are not updated. This means that a subsequent query that looks for a newly-inserted article by author, publication date, or XML pattern will find the article, but one that searches on full-text will not. This is heresy from a DBMS viewpoint, and violates the ACID properties of transactions. (You must manually run a command like update text indexes on table to get the XML text indexed.)
- I believe that in Viper (and I'm sure in other upcoming relational-XML implementations) that you cannot access text-search calls from XQuery. So if you want to invoke text search on a XML column, you have to use SQL because the built-in functions are only implemented there.

1 comments:
Nice info
Regards,
Joseph
http://db2examples.googlepages.com/
Post a Comment