Monday, September 29, 2008

How Big is Big? Oracle's Largest Data Warehouses

I found this post, entitled Some of Oracle's Largest Warehouses, on the DBMS2 blog and I thought I'd re-sort them by size in descending order. So, here they are:


Quoting Curt:
10 databases total are listed with >16 TB, which is fairly consistent with Larry Ellison’s confession during the Exadata announcement that Oracle has trouble over 10 TB, which is something I’ve gotten a lot of flack from a few Oracle partisans for pointing out
While I know it's a bit unfair to compare contentbases with databases (because content is generally so much bigger and there is so much more of it), I thought I'd point out that the largest MarkLogic production application today runs at over 100 TB and that a typical publisher has single-digit terabytes of content. And we're just getting started. And we're not storing lots of stuff redundantly to optimize performance as you would in a data warehouse.

Thursday, September 25, 2008

New Mark Logic Executives: Jason Monberg and Cathy Lewis

A quick post to welcome to two recent additions to the Mark Logic executive team.
In his prior lives, Jason has experience at Dashlight Software (a MarkLogic-based application), Composite Software (data virtualization), Carbon Five (J2EE development), Sparks.com (consumer), and USWeb/CKS (Internet consultancy). Jason brings a great developer perspective to Mark Logic which I hope will result in the products becoming easier to both use and integrate over time.

In her prior lives, Cathy has experience at Sun Microsystems, Saba (human capital management), and Business Objects (business intelligence, where we worked together), and PrimeResponse (customer marketing). Cathy brings a strong accounting, control, and compliance background to Mark Logic which should help us as we build the company and the sophistication of our processes over time.

We've experienced a lot of growth during my tenure at the company, and we're happy to round out and enrich the executive team as we go forward.

Tuesday, September 23, 2008

Miffed Over OOXML, IBM Threatens to Leave ISO

See this New York Times article, IBM Threatens to Leave Standards Bodies, which describes IBM's irritation about Microsoft's success in fast-tracking open XML (also known as Office open XML) as alternative ISO standard to the existing open document format (ODF).

Excerpt:

Microsoft submitted OOXML to the ISO under a so-called Fast Track process, which some opponents believed was too rushed and resulted in a poor-quality standard. Many countries and technical experts questioned the need for another standard document format.

A draft standard OOXML was approved by ISO/IEC Joint Technical Committee 1 (JTC 1) in a vote that closed March 29. Brazil, India, South Africa and Venezuela filed appeals over its approval, but the appeals were dismissed in July. The appeals centered in part on alleged irregularities in the ISO's voting process.

Venture Capital: Quaint by Comparison

I've had several people ask me what it's like to run a venture-backed start-up with the current mess on Wall Street. My short answer, from a financing point of view, is that "not much has changed." You can see why if you look at how the VC business works:
  • Venture firms run venture funds
  • Venture funds are typically 10-year, illiquid partnerships
  • A fund typically has one general partner (GP) and multiple limited partners (LPs)
  • The LPs are the investors who commit capital at the start of the fund
  • LPs give that capital to the fund over the first few years through a series of "capital calls"
  • LPs who miss capital calls are typically diluted out of the fund through draconian provisions in the partnership agreement
  • The GP runs the fund, and earns nice fees for so doing. (Typically 2-3%/year of the committed capital and plus 20-30% of the upside. The fixed fees pay for healthy base salaries and the tasteful but not extravagant offices on Sand Hill Road. The upside cut is what buys Gulfstreams and ranches in Montana.)
  • The GP follows some process to decide which early-stage companies it wants to invest in
  • The fund buys preferred stock of those companies (founders and employees typically hold common)
  • The fund holds those shares, seeking an eventual "liquidity event," such as an IPO or acquisition.
Next to credit default swaps, complex derivatives and 30:1 leverage ratios, the venture capital business looks almost quaint by comparison: using no leverage, buy and hold the stock of a portfolio of early-stage companies.

Aside
Speaking of leverage, I thought I'd share this quote du jour that I found on Infectious Greed:

What is truly disgraceful is that investment banks could only manage returns on equity of 15-25% with a balance sheet that was often leveraged to the sky.

-- Niels Jensen and Jan Wilhelmsen, of Absolute Return Partners

With a 30:1 leverage ratio, it means the underlying investments are returning <1% a year. While that makes sense in true arbitrage situations, the use of high leverage went well beyond classical arbitrage as far as I can tell.

Venture Capital Cartograms

I found these cool cartograms on one of my favorite blogs, Infectious Greed, the other day. They represent absolute venture capital disbursement (on the right) and venture capital disbursement per $K of GDP.


Evidently, Kedrosky found them in Nature but after a quick look around their site I couldn't find the source article.

Monday, September 22, 2008

Using Agile Methodologies Presentation

Below please see the slides from the presentation I gave today at the Outsell Signature Event at the lovely Ritz Carlton in Half Moon Bay, California. I'm passionate about agile development because I've simply seen too many waterfall train wrecks that either kill companies (e.g., Ingres) or nearly kill them (e.g., Business Objects).

In many cases, those software development messes actually obscure underlying deeper problems. For example, at Ingres, I'd argue the root cause problem was a lack of competitive strategy for dealing with the fact that the company had been "lapped" by Oracle, resulting in a ridiculously long requirements list. But, I'd further argue that a realistic agile process would have made evident that the list could not be accomplished and may have forced the company to more quickly deal with the ugly reality that it faced.

One key point that's not on the slides is that while most publishers will say "yes" to a survey asking if they are using agile methodologies, my anecdotal data suggests that those same companies' IT leadership don't see things the same way. For example, at the panel session on agility hosted by Marc Strohlein at last Spring's Mark Logic user conference, one of the top audience questions was, in effect, how can I do agile at a company that isn't?

Perhaps someone (e.g., Outsell?) needs to do some gap analysis between the business and IT sides of the publishing industry on this issue.
Using Agile Methodologies
View SlideShare presentation or Upload your own. (tags: agile waterfall)

Sunday, September 21, 2008

Bubblenomics: The Financial Mess from an Investor Perspective

After reading literally scores of articles about the recent Wall Street crises, I recommend this one from today's Week in Review section of the New York Times. Entitled simply Bubblenomics, I think it is the best I've read from both an overall and investor viewpoint.

Excerpt:

Only now, for instance, are the bubbles of the past decade and a half, first in the stock market and then in real estate, starting to go away. It’s easy to think of the turmoil of the past 13 months as being unconnected to the stock bubble of the 1990s, which appeared to end with the dot-com crash of 2000 and 2001. That crash brought down the overall stock market by more than a third, its worst drop since the 1970s oil crisis. Corporate spending on new equipment then plunged and employment fell for three straight years.

But dramatic though it was, the dot-com crash did not actually come close to erasing the excesses of the 1990s. Indeed, by some of the most meaningful measures, Wall Street after the crash looked a lot more like it was in a bubble than a bust.

The good news? P/E ratios are how back to their historical average. The bad? Looking at the chart, they tend to sink right through the average before hitting bottom at about half that.

Friday, September 19, 2008

Xobni Office Tour Video

Thanks to Kelly Stirman for pointing this out to me -- it's a video tour of Xobni's office guided by their founder, evidently part of series that Slate is doing called Cubez.

As I like to say: web 2.0 is like web 1.0 without the exits. In this case, right down to the Aeron chairs.

Tip: count how many time he says cool or stemmed variations thereof.

Thursday, September 18, 2008

Mirko Minnich Keynote at the Luxid 5.0 Launch

I attended the launch of Luxid 5.0 from Temis yesterday in New York City, atop the trendy Hudson Hotel, in a ivy-gilded tent as suitable for a wedding as a product launch, with a view of the Hudson River, and conveniently adjacent to a fashion event put on by Level 99 Jeans.

Mirko Minnich, SVP and CTO at Elsevier Health Sciences gave the keynote address, which I thought I'd cover in this post. In a crisp suit with a pastel tie, Mirko, who speaks with a light German accent, talked about enrichment and publishing.

He started with a humorous anecdote about his cholesterol becoming elevated precisely at his fortieth birthday and the challenge presented by trying to use a 700-page tome on cholesterol to answer his basic questions: what is this, how bad is it, what do I do next, how does this compare, etc.

Mirko then presented a model for how business value delivered over the web has evolved over time:
  • Level 1: the website model, where people simply experienced the content. He called this the SVP model, short for search, view, and print.
  • Level 2: the audience model, where you build a cross-enterprise platform that serves people on both sides. For example, in medicine, using common infrastructure to serve the needs of patients, physicians, and payers on the one hand and researchers, students, and nurses on the other.
  • Level 3: the roles model: where you build "role-based, in-context, process-driven applications." This means, in effect, building applications that know who you are, what you're trying to do, where you are in the process of doing it, and then providing appropriate content to help you do it. (I couldn't agree more with this vision. See, for example, here, one of my many posts on task- and role-aware content applications.)
He talked about what it takes to deliver such Level 3 applications.
  • Increased relevancy and useability
  • Strong core features
  • Web 2.0 expectations
  • New content
  • Increased precision, ranking, and recall
Mirko argued that publishers were "sitting on huge assets" which, given the ambient noise and his slight accent, I first misheard as something else, chuckling at the potential double entendre. (I've added a post tentatively entitled "Publishers: Sitting on Their Large Assets" to my to-do list for this blog.)

But seriously, I took Mirko's point and it wasn't the only time I heard talk of "improving the return on content assets" during my week in New York City. The fact is, most publishers have huge collections of valuable content and there are nearly innumerable ways to get value from it with a combination of imagination, re-use, re-purposing, and customer-centered -- as opposed to content-centered -- design.

Mirko said the way forward was to:
  • Build the infrastructure for change. (Here again, I agree wholeheartedly and, word from our sponsor: we at Mark Logic believe that MarkLogic Server should be the XML centerpiece of such an infrastructure.)
  • Create a solid content base (or, as I like to say, contentbase).
  • Optimize the organization.
  • Migrate to a user-centric focus.
Mirko said that people needed to move away from a "content is king" mentality and towards putting the user at the center of development. (Again, I agree. See here for one of my riffs on contextual design.)

He then closed the loop back to his original cholesterol story, painting a vision for how to distill and present just what needed from the 700-page cholesterol tome, giving me the idea to include an EPM-style goals management system to the process. After all, if you discover that you should be eating 3 servings of vegetables per day, then why not go the next step and start tracking progress towards that goal?

Overall, I agreed with Mirko's vision and I enjoyed his speech.

The only part that I get stuck on is pricing (which Mirko didn't discuss). People are used to paying $25 for hardback and $125 for a textbook. While everyone likes to complain about information overload, will people really pay more for less? Or, as the flawed beings that we are, will we irrationally persist in paying $50 for something that buries us in information instead of ideally $75 for just the bits we need (paying a premium for the time savings) or at least perhaps $10 for just the bits we need (thinking that we're not using the whole book, just a bit of it).

I know there's a demand elasticity question in there as well (i.e., will revenue be maximized selling fewer $75 things or >7.5x more $10 things). But, overall, I don't know.

My hunch is it's a generational thing and that over time it will become natural to pay a premium for less, more-relevant information rather than more, less-relevant information. I think it will happen. The question is when and what to do about it in the meantime.

Monday, September 15, 2008

My Blog Cited in CNBC's Executive Careers Blog

I'm excited to report that the Mark Logic CEO Blog was cited in CNBC's Executive Careers blog in a recent post entitled Blogging and the CEO.

Excerpt:
Here’s a rundown of some of the best CEO blogs; take some inspiration from these visionary leaders if you’re thinking of starting your own executive-level blog.
They then go on to list the following folks/blogs as the ones to watch:
  • Jonathan Schwartz, Sun Microsystems
  • Mark Cuban, Dallas Mavericks
  • Dave Kellogg, Me!
  • Bill Marriott, Marriott
  • Mike Critelli, Pitney Bowes
Here's what they had to say about my blog:
Kellogg, CEO of startup Mark Logic (which is developing XML server) has been blogging since 2005 on everything from business plans to XML database development.

Posting frequency: At least once every business day, usually by 8:30 am Pacific time; sometimes Kellogg posts two or three posts over the course of the day.

Bonus: If you’re a tech exec interested in database development, this blog is chock-a-block with info on the latest developments search paradigms.

Thank you! This kind of recognition helps make the midnight posts worthwhile.

Sunday, September 14, 2008

Interviewed in Dr. Dobbs Journal

Check out this interview with me, published this past Thursday, in Dr. Dobbs Journal, and entitled XML as a Content Platform.

Excerpt on history:

In essence, because RDBMSs date back to the 1960s and pure XML databases only date back to around 2000, the XML database vendors get the coveted chance to "start over" in designing a database system. So we can quickly incorporate a lot of the features put in RDBMSs over the past few decades while at the same time optimizing for XML.

Excerpt on XQuery:
Because XQuery was the DBMS community's chance to start over and they took it. XQuery is superior to SQL for a number of reasons. It's a full programming language, not just a data manipulation language. It handles XML natively, and XML is indeed becoming more and more pervasive.
Excerpt on "SQL is COBOL" to our kids:
Our kids will think of SQL the way that we think of COBOL. ("Daddy, do you mean you used a database language that assumed all data was stored in tables and didn't natively understand XML?" "Yes, Muffin, and I used to have to sew my own clothes, too!")

Wednesday, September 10, 2008

Speaking of Misfired: Departures at VMware Following Greene Ouster

Sorry for the play on words between the last post and this one, but it looks like there is an exodus of key players at VMware in the wake of Diane Greene's unceremonious ouster by EMC.

See this story in the New York Times: At VMware A Firing is Still Reverberating.

Plus, it seems like somebody at EMC needs to loan Joe Tucci a copy of Emotional Intelligence. See this excerpt:
Mr. Tucci, who heads VMware’s parent company, EMC, pulled [Greene] aside, according to people familiar with the events, ...

Inviting Mendel Rosenblum, Ms. Greene’s husband and the co-founder of VMware, into the room, Mr. Tucci told Ms. Greene she was fired, effective immediately. And he said the board wanted Mr. Rosenblum, VMware’s chief scientist, to take her seat on the board. Mr. Rosenblum declined the offer.

[...]

In the wake of Ms. Greene’s departure, three other key executives loyal to her have left, including Mr. Rosenblum, who announced his resignation and return to Stanford as a full-time professor in a companywide message on Monday night.

I'd guess that 9 out of 10 husbands would refuse to take the post opened by their wife's sudden termination and the other one would find himself in a divorce. But maybe I'm sensitive.

Misfired Google Alert Tanks United Airlines Stock

And I thought it was just me. About a month ago, I made a fool of myself in front of my company by forwarding a "news" item to a broad distribution list that was, in fact, about 7 years old.

The cause: I received the story as a Google Alert and, without even reading the dateline, forwarded it to the company as news. I didn't think twice about it -- until today, when I heard that Google seemingly did the same thing for a 6 year old story announcing United Airlines' bankruptcy, dropping their stock from about $12 to about $3 when trading was halted.

Here's the Wall Street Journal's coverage of the story.

Note that the media (on an admittedly quick skim) doesn't explicitly say a Google Alert drove the troubles. Instead they're saying that the story somehow ended up on a "most read" list, which in turn created a virtuous/vicious cycle of readership, which kept it there. Given my experience, I'm guessing a misfired alert was the root cause -- i.e., the cause of the story making a most-read list is that an alert fired it as news. That's my guess based on what happened to me.

Here's Valleywag's coverage of the story.

Mark Logic CEO Blog Named "Interesting Blog"

It's not exactly an Oscar, but I'm nevertheless happy that Curt Monash did a brief write-up on this blog in the Network World community area, calling it an "interesting blog" and doing some analysis of it. Excerpt:

Dave does several different things in his blog, all of them well.

  • He outlines the general case for XML database management.
  • He highlights applications and other proof points supportive of his company's value proposition.
  • He offers broader market insight, into the adjacent areas of business intelligence, database management, and text search.
  • He offers enterprise software business insight, especially in the area of marketing.

Of course Dave is biased, but in many posts he does a job of modularizing his biases away from some fairly dispassionate analysis.

While I try hard to keep the blog from turning into a commercial because I want people to read it, I do have a de facto, pro Mark Logic viewpoint. Rather than pretend that's not there, I try to be up front about it (see the FAQ). It's also one reason why I named the blog "Mark Logic CEO Blog" instead of something like "Kellblog" (which Mike Moritz once called me) -- so you can know the role from which I'm writing.

In addition to the discussion of the blog, Curt makes an interesting comment on marketing:
Dave is also Exhibit A for my theory that it's hard to have a completely qualified VP of Marketing, because if you do, s/he will also be qualified for and eventually move on to a CEO job, something that is less true of Sales VPs, ..., and also less true of Engineering VPs, ...
At the time I decided I wanted to become a marketing VP about 20 years ago, I did so with the ultimate goal of becoming a CEO. I figured that marketing was the best jumping-off point from which to become CEO on the theory that it provided a more holistic view of the business than sales or engineering. (As Theodore Levitt once said, marketing is about the entire business from the point of view of the customer.)

It was a risky decision because back then most software CEOs came from sales. Happily, increasing numbers of marketers are making it to the CEO role, succeeding at it, and hopefully blazing the trail for others.

In any case, thanks for the recognition Curt. I'll return the favor and point out Curt's DBMS2 and Text Technologies blogs. In my opinion, Curt is one of the very few "database people" who also understands search and text. He's in my blogroll -- maybe he should be in yours.

Tuesday, September 09, 2008

Mark Logic Xcelsius-based Performance Dashboard

Just a quick post to highlight the warm reception for the Xcelsius-based performance dashboards built for MarkLogic Server by former Ingresite, former BOBJoid, former Mark Logician, and now current SAPian Tom Turchioe.

The MyXcelsius blog just named them Dashboard of the Week.
.... an administrator can zero in on any performance related issues quickly and easily. Additionally, Xcelsius’ XML capabilities facilitated rapid integration with MarkLogic. These capabilities also enable creation of Xcelsius generated dashboards that provide insight into content stored in MarkLogic in an intuitive and easy to use fashion and enhance the existing administrative interfaces available with the product. This download (and other open source MarkLogic downloads) is also available at the MarkLogic Developer website.

Monday, September 08, 2008

The Claremont Report on Database Research

I recently stumbled into this outstanding article, The Claremont Report on Database Research, which articulates the findings of a two-day conference held in May, 2008 amongst more than 20 of the database community's most visible and influential researchers. Participants included Michael Stonebraker, Philip Bernstein, Rakesh Agrawal, Eric Brewer, Michael Carey, Roger Magoulas, and Tim O'Reilly.

The group concluded that we are at a turning point in database evolution/research, and I agree with them. They identified a number of areas for future research including:
  • Revisiting (core) database engines
  • Declarative programming for emerging platforms
  • The interplay of structured and unstructured data
  • Cloud data services
  • Mobile applications
  • Virtual worlds
This is not the first time such a group has met. This is just one of a series of such meetings, that started in 1988, and continued in 1990, 1995, 1996, 1998, and 2003. See page 10 of the report for some self-reflection on conclusions from prior summits. Here is a chart that shows the recurrent topics over time.

Forbes Interview: Corporate Pack Rats

I recently had a conversation with Ed Sperling from Forbes who runs their "CIO Chat" column. Today, Forbes.com ran a story, entitled Corporate Pack Rats, resulting from that interview.

It's hard to talk about content these days without talking about e-discovery, email archive search (wouldn't MarkMail be wonderful at that?), and compliance. So the story starts out with a chat about that.

I then go onto one of my new rants: why does everyone want to play offense (think: business intelligence) with their data, but simply play defense (think: records management, e-discovery) with their content? Yes, not going to jail is important, but don't you believe there's value in your corporate document/content that help you build better products, serve customers better, and improve the efficiency of your operations?

This excerpt summarizes it well:

Do CIOs get this?

Most CIOs? No. The vast majority are still in a place where they're trying to avoid getting in trouble with their documents.

We later started talking about one of my favorite topics, XML, where there's another nice excerpt:

Does all the content have to be tagged with XML, because there's a lot of content that predates XML?

The better the tags, the better the queries. If you want to find all documents that contain the words "bird strike," any text search engine can do that. If you want to find all documents that classify procedures related to approach, if all that is tagged, you can get a pinpointed result. Without tags, you may learn that somewhere in the 300-page PDF are the words "bird strike." That's not very helpful. With the tags, you can increase the precision of searches and their granularity.

Finally, another nice excerpt related to the slow, inexorable move towards XML:
There will be transition issues, but over the next three to five years we're going to move from a ".doc" world to ".docx." Right now, rounding up it's 1%. But in five years, rounding down it will be 100%.
Indeed. And that's one big change.

Thursday, September 04, 2008

Built to IPO, Flip, or Last?

While it's taken me a while to post on this Wall St Journal article, it's still as relevant today as it was back in July. The article discusses the recent dearth of IPOs, arguing that the long-closed IPO window is changing the way startups think about themselves, they way venture capitalists think about startups, and threatening the great Silicon Valley venture-capital-driven innovation machine.

In a blog that generally offers more critique than praise, I’d simply say: I think the author's right. Fewer startups run the gauntlet to IPO and I think that’s the result of three things:
  • The SOX “tax” – an estimated $2M-$3M annual nut – which all but wipes out the bottom line of what were previously IPO-ready companies and reduces market caps. Example: for a 50% growth company with a 1.0 PEG ratio, $3M in SOX expense wipes out $150M in market cap.
  • Lack of demand in the public markets. As mentioned here before, when you look the Software Equity Group’s IPO pipeline, you can impute that the IPO window is what I call 50/50/0 -- i.e., $50M+ in TTM revenues, 50%+ growth, and 0% EBITDA. But, while that may be the window to make it potentially worth filing – make no mistake – the IPO market is currently closed.
  • Industry consolidation. The article surprisingly misses this point, but the software industry has sufficiently consolidated that plunking down $75M to buy a plateaued startup is nothing, and even paying $300M - $500M to buy someone on a roll is basically chump change. And, if you’re SAP, Oracle, Google, or Microsoft, even $1B isn’t much to buy your way out of a strategic headache – and heck – since goodwill is no longer amortized and they’re typically buying with stock and can cut enough costs to make the acquisitive instantly accretive, it’s effectively “free” anyway.
The last point sometimes makes me wonder if software will end up like pharma or biotech where it seems that big companies have effectively outsourced innovation to startups. The big guys are willing to pay big dollars for the few who succeed in order to avoid billions of R&D that it takes to find the winners. Simply put (and from quite a distance) it seems they’ve outsourced the financing of innovation to venture capitalists.

If I were at one of the big software oligopolists, I probably do the same thing. I’d watch ten startups, let 3 fail, let 3 fail into mediocrity and buy them for chump change, and pay 10x revenue for the one that went red hot. You win some, you lose some. And – even when you lose you win – because you are so much larger than your targets that you can let them grow to even $200M in revenues and still buy them without much pain.

That’s a new dynamic.

This prompts the question: is the next-generation of VC-backed startups built-to-flip instead of built-to-last? Frankly, I think the answer’s a mix.

Increasingly, I think web 2.0 startups that take relatively little capital are running a different formula than classical enterprise software vendors. The latter might raise $30M in VC, hoping to go public with a $500M market cap. The former might raise only $10M, hoping for a quick sale at $50M. This changes venture economics, but the system can still work.

Prior to Mark Logic, I’ve worked at only three software companies: Ingres, Versant, and Business Objects. All three were venture backed. All three went public. And all three went public – more or less – in the year in which they did $30M in revenues. My, how things have changed!

By contrast, let’s look at Endeca, a player in enterprise search who started out in e-commerce search, bringing OLAP-style dimensional navigation to the content world. Later, the company branched into more areas (seemingly too many if the recent stuff I’m reading about spend management, other apps, and a DBMS-like positioning is correct).

Per a recent 451 Group report Endeca did about $100M in revenues in 2007, growing 70% over 2006, with 500 staff, 500 customers, an average deal size of $350K and a 90/10 direct/indirect channel model. They’re silent on profitability, though they recently raised a $15M venture round bringing total investment to about $65M, suggesting they’re still burning cash. The numbers, with the exception of the unknown profitability and the high direct sales dependency (which are quite possibly linked), overall look pretty good.

But Endeca first talked about an IPO in 2006 and 2+ years later they’re still all dressed up with nowhere to go. Why? I’d guess it's a combination of the IPO window closure and (perhaps) some process issues related to compliance, which these days are another leading cause of IPO stall-out and an indirect form of SOX tax.

Frankly, I think it’s too bad. While I want to crush Endeca in the relatively few deals in which we compete (and complement them in the relatively few where we do that as well), I nevertheless believe that Joe Investor should be able to buy their stock. By forcing the de facto IPO bar ever higher, the US is locking out individual investors from participating in early-stage technology companies. That’s not good.

Why'd we do it, then? Because of the excesses of the web bubble and the early 2000s, one says. But, when I think about that era, the problems fall into two distinct classes:
  • Investors awarding $1B valuations to startups with $5M in revenues. While I think this was ostensibly insane, it should nevertheless be permissible – no one forces you to buy a share of Beyond.com in 1999. No one forces an investor to participate in a speculative bubble. Some would argue they're a normal market phenomenon. They shouldn't be outlawed. Caveat emptor.
  • Fraud a la Enron. This needs to be wiped out. No question. (For an interesting perspective on Enron, read Open Secrets by Malcom Gladwell.)
Somehow, I think we mixed up the two different problems along the way by enacting laws that throw the early-stage baby out with the anti-fraud bathwater. The result is that individual investors are denied access to early-stage growth companies and, the Journal argues at least, that we are threatening the health of the Silicon Valley innovation machine.

Wednesday, September 03, 2008

Friendly Request: Please Subscribe via Feedburner

Dear Readers,

Just another friendly request to ask that you subscribe to this blog via its Feedburner feed -- http://feeds.feedburner.com/marklogic -- and not via the blogger/blogspot feeds or any other mechanism.

Why? This allows me to get the best possible stats on who's coming, what they're reading, et cetera, so I can best serve my readership.
  • Thank you for reading my blog
  • If you clicked an ad once in a while I'd feel more like a "real" blogger (last I checked, I was making about $0.16/hour as a journalist)
  • If you like my blog, please tell a friend or, better yet, link to me in yours
  • Please subscribe via Feedburner!
Thanks/Dave

Yahoo Breaks $19. What *Were* They Thinking?

Check out this article in Silicon Alley Insider by Henry Blodget
Remember when Yahoo haughtily rejected Microsoft's $31 per share offer for the company as "massively undervalued"? Those were the days.

Yahoo closed at $18.75, it's lowest level in five years. For those who don't care to recall, that's below where it was trading when Microsoft hand-delivered the gift of a lifetime to the battered Board's door.

Here's Yahoo's chart for the past 2 years. Note that it's traded above $31 for a maybe a month or two during that entire period.

And if you want to remember what Yahoo was telling investors back in March, when the stock was around $30 because of the acquisition offer, here's my post that includes the full PowerPoint of their investor presentation.

Frankly, I think that too often ego / politics / religion gets in the middle of what should be relatively straightforward business decisions. Simply put, if it's March, 2008, you're Yahoo, and someone's offering you $31/share when you were trading at less than $20, you need to do a classic decision tree analysis:
  • Create multiple scenarios for the future, and the stock price implied by them assuming normal ratios
  • Assign realistic probabilities to those scenarios
  • Get an expected value for each one
My hunch is that when you do that for Yahoo, in the "everything goes perfectly" scenario you might justify a stock price of $31 two years hence. But when you factor in the probability of that scenario occurring, I'm pretty sure the inexorable conclusion is sell.

Tuesday, September 02, 2008

XML: Good, Bad, Bloated?

GCN ran an article last month, entitled XML: The Good, The Bad, and the Bloated, about which I wanted to share a few thoughts.

The article begins (bolding mine):
Depending on whom you talk to, Extensible Markup Language is either the centralized solution for managing cross-platform and cross-agency data sharing, or it's a bloated monster that's slowly taking over data storage and forcing too much data through networks during queries.

Which view is accurate?


In general, I believe XML's flexibility and cross-platform capabilities far outshine any negatives. But if XML files are not properly planned and managed, there is a good possibility that you could experience XML bloat.
First, I'll note that the author balances the pro/con of XML and comes out pro: XML's benefits outweigh its stated and perceived disadvantages.

Now, let's move on to the cons:
But XML bloat occurs when files are poorly constructed or not equipped for the jobs they must perform. There is a strong temptation to cram too much information into files, which makes them larger than they need to be. When an agency needs only part of the data, [it] often has to accept the whole file, including long blocks of text.
First, I'd say that "long blocks of text" are often the data in which analysts are interested, so we must be careful not to quickly classify them as baggage (i.e., let's not be too data-centric in today's world).

Second, I'd agree that the blind marking of everything in XML can be wasteful. That's why I've long advocated a "lazy" approach where:
  • You first decide application requirements and then create XML tags in order to support them, iterating over time on both the application requirements and the sophistication of the XML to support them.
As opposed to a far-too-common "big-bang" approach whereby:
  • You design "the ultimate schema," which can answer virtually any possible application requirement, and then spend enormous time and money first designing it, and then trying to migrate your data/content to it.
The problems with the big-bang approach are many:
  • Designing the ultimate schema is a Sisyphean task.
  • You spend money investing in XML richness which has no short-term return; i.e., you over-design for the short-term
  • You lose your budget mid-term because while you're designing perfection, the business has seen no value and loses faith in the project.
As I like to say, "big-bang approaches often result in a big bang," or, similarly, with too many content-oriented systems "the first step's a doozy" beyond which you never pass.

At Mark Logic, we're trying to change all that in three ways:
  • By delivering a forgiving XML system that accepts content in a rather ragged form, enabling you to ingest XML immediately and begin delivering value against it.
  • By evangelizing a lazy XML enrichment and migration approach that delivers business value faster than big-bang approaches.
With Mark Logic, the question is not "how much slower do I have to go than an RDBMS and get the benefits of XML," it's typically "how much faster does it go than an RDBMS and still deliver the benefits of XML?"

In customer benchmarks, we've see out-performance of 10:1 as common and outperformance of an RDBMS by 100:1 is certainly not unheard-of. Ask our customers and partners: MarkLogic is fast.

The article continues (bolding mine):
Luckily, technologies are evolving that can help with XML bloat.

First is the evolution of platform-based XML solutions that offer a single system to author, tag, store and manage XML files. They also allow developers to set the policies for dynamic XML integration into other documents or applications. Mark Logic is one of the best-known purveyors of such solutions,
...
A lot of XML bloat perception comes from the idea that you're inserting tags into ASCII files and those files increase by the size of the tags which, at times, appear material relative to the size of the content.

As a trivial example, if you have an XML element is named publication-author, with value (i.e., the author's name) "Joe," then you have added 41 characters of "overhead" (begin and end tags) to the underlying data of 3 characters. And, if Joe has authored 1,000 documents in the collection, you'd argue that you've added 41,000 characters of overhead for 3,000 characters of data. And you'd see precisely that if you looked at an ASCII serialization of the XML.

But good XML systems don't store XML that way. XML is naturally tree-structured and XML documents are stored as trees. What's more, the element names (i.e., the tags) are typically hashed. So the 20-character "publication-author" element name get hashed to 64 bits once and every time the tag appears in the corpus only the hash-value is stored. So it's not 41K of overhead to 3K of content in the preceding example, it's more 2K to 3K.

In fact, by Mark Logic rules of thumb, the picture often looks like:
  • 1MB of text source content, which becomes
  • 3MB of XML, which becomes
  • 300K of compressed XML in MarkLogic, which becomes
  • 1MB of compressed XML + indexes in MarkLogic
Simply put, it's often the case that the content blows up a bit in XML only to be compressed to 1/10th its size, only to be re-inflated through indexing back to its original size.

Now this certainly isn't true every time. Sometimes content + indexes ends up 2-5x the original size. But critics should remember: (1) you then have rich XML tags that enable you to do something with the content and (2) you then have indexes so you do it, fast. (Often the counter-arguments make it sound like nothing is gained for the size increase.)

Finally, I'd add two points:
  • With magnetic disk storage well less than $1/gigabyte (e.g., this drive) for consumer applications and maybe $10/gigabyte in a mid-range SAN .... to put it bluntly ... should you care? Despite our (potentially advancing) age and attitudes about storage costs, we should not conserve storage for conservation's sake, but instead optimize our computing investment so as to maximize overall return paying heed to the relative costs of subsystems and to value of functionality enabled by them.
  • Your XML can be as big or rich as you want it to be. And with MarkLogic, you can change that richness over time. Our presumption is that you are adding elements because you want to use them to deliver business value so technically speaking, there should be no "wasted elements" -- i.e., elements that merely inflate size and deliver no value. That is, if you're paying attention and following a lazy XML approach, then your XML should be no richer than the functionality required by your appliactions, and ergo -- by definition -- there is no waste or bloat.
Basically, if your content gets bigger, it's simply because you wanted to do more things with it.