Tuesday, March 31, 2009

Dan Clancy of Google Book Search To Appear at Mark Logic User Conference

I'm pleased to announce that I'll be conducting a 30-minute live interview with Dan Clancy, engineering director for Google Book Search, at the Mark Logic User Conference in San Francisco on May 12-14, 2009.

Dan has been working on Google Book Search for approximately four years, worked directly on the Google Book Search class action settlement, has been a key spokesperson for Google in communications about the settlement (e.g., this ALA panel, this New York Times article), and is able to hold a quite a broad range of conversations about Google Book Search. For example, in a recent chat with Dan, we covered topics ranging from e-book formats, the internal XML-based representation of books that Google uses, details of the settlement, implications of the settlement, cases not covered by the settlement, and possible future scenarios for the book publishing industry.

Suffice it to say that Dan's a fascinating guy with an extraordinarily broad knowledge of a topic critical to the future of publishing. I'm thrilled that Dan has agreed to join us and certain it will be an awesome session.

About Dan Clancy (biography lifted from here)
Daniel J. Clancy, PhD, is the Engineering Director for the Google Book Search Project. This project is working to bring off-line books content on-line and make it searchable to allow discovery of books. Google is working with both publishers and libraries as part of this project.

Prior to coming to Google in January 2005, Dr. Clancy was the Director of the Exploration Technologies Directorate at NASA Ames Research Center. The Directorate supports over 700 people performing both basic and applied research in a diverse range of technology areas intended to enable both robotic and human exploration missions. Technology areas include Intelligent Systems, High-end Computing, Human-Centered Systems, Bio/Nanotechnology, Entry Systems and others. In this role, Dr. Clancy played numerous roles at the agency level including participating in the team that developed the agency’s plan to return men to the Moon and eventually Mars.

Dr. Clancy received his PhD from the University of Texas at Austin in artificial intelligence. While in school, Dr. Clancy also worked at Trilogy Corporation, the NASA Jet Propulsion Laboratory and Xerox Webster Research center. Dr. Clancy received a Bachelor of Arts from Duke University in 1985 in computer science and theatre.

Related Posts

Thursday, March 26, 2009

Pithy Monash Post on Future of Media / Journalism

Just a quick post to highlight this note by Curt Monash, The Grand Discussion on the Future of Journalism, which I found both pithy and chock full of great links.

Risk-Taking in Single- and Mixed-Gender Environments

I found this post, The Case for Single-Sex Trading Floors, on one of my favorite blogs, Infectious Greed by Paul Kedrosky. It discusses a recent article in the Journal of Economic Psychology (there's a specialized field, for you) entitled Are People More Risk-Taking in the Presence of the Opposite Sex.

The basic answer appears to be "yes," in both males and females. Interestingly, "attractiveness" apparently had nothing to do with it.

Excerpt:
Both males and females viewing opposite sex photos displayed a significant increase in risk tolerance, whereas the control subjects exhibited no significant change. Surprisingly, the attractiveness of the photo had no effect; subjects viewing photographs of attractive opposite sex persons displayed similar results as those viewing photographs of unattractive people.
I'd love to know by what standard they decided attractiveness -- I once read that facial symmetry has a huge impact -- but I digress.

Offering an anecdote supporting the increased risk-taking theory, I have a (male) friend who insists on working with female personal trainers, but who keeps injuring himself during his training sessions. Perhaps, this study suggests, he'd be better off using a male trainer instead, and presumably taking less risk (i.e., pushing himself less) on his exercises.

I think the study also raises interesting questions about same-sex schools, adding yet another variable to the already complex pro/con mix.

As a final aside, now that greed isn't as infectious as it once was, perhaps Paul Kedrosky should consider a new name for his blog. I recently heard a great quote from Chris Pilling of Complinet: "trust is the new greed." Paul Kedrosky's Infectious Trust? Perhaps not.

We will now resume our regular programming in XML, enterprise search, relational database, and semi- and unstructured information management.

Wednesday, March 25, 2009

If At First You Don't Suceed, Should You Try, Try Again?

Check out this article in the New York Times that overturns Silicon Valley conventional wisdom about failure. Per a Harvard Business School working paper, which looked at several thousand venture-capital-backed companies from 1986 to 2003:
  • First-time entrepreneurs had a 22% chance of success
  • Already-successful entrepreneurs had a 34% chance of success (a 55% relative increase)
  • Previously-failed entrepreneurs had a 23% chance of success
That is, the lessons from having tried and failed added up to a 1% overall increase in the success rate. Surprising news for a valley in which failure is often seen as a red badge of courage.

Excerpt:
“The data are absolutely clear,” says Paul A. Gompers, a professor of business administration at the school and one of the study’s authors. “Does failure breed new knowledge or experience that can be leveraged into performance the second time around?” he asks. In some cases, yes, but overall, he says, “We found there is no benefit in terms of performance.”
The New York Times article is here. The complete working paper is here (PDF, 35 pages).

Tuesday, March 24, 2009

Oracle's Become Computer Associates (CA)

It's all happened so quickly.
  • I've become my parents,
  • SQL has become Cobol,
  • and Oracle has become CA
Things were different back in 1985, when I, fresh from Berkeley, VAX/VMS documentation in hand, rode the 51 bus to Alameda where my first real employer, Ingres, was based.

You see, back then, Oracle wasn't the establishment. Oracle was the rebel, a $50M-ish hyper-aggressive competitor trying to steal the relational database market out from underneath its lethargic inventor, IBM.

Back then, there was one software company that I didn't understand. It didn't really invent anything. It just bought up all the sick and dying software companies, often those who'd missed the mainframe to mini-computer transition. It acted as a software-industry garbage collector.

Being a bit of geek, I'd always thought of it as the planet-eating doomsday machine from Star Trek.


Who was this company? Well, CA, of course.

CA made money with the following strategy. They'd:
  • Pay a pittance for a broken software company (often less than 1x revenues)
  • Fire all the staff, leaving only a skeleton crew
  • Perform only basic maintenance on the acquired software
  • Crank up the maintenance fees on the largely helpless installed base
In fact, you could argue that CA was the first software company to truly value the maintenance annuity, at a time when most software companies were focused on the higher-margin license fees. And CA fully exploited the switching costs built into the enterprise software market.

Well, who's doing that strategy today? Oracle, of course. Since my kids have been doing test prep, I'll phrase this as an analogy: Oracle is to minicomputer as CA is to mainframe.

See this article, Oracle Fees for Maintenance and Support under Fire, which prompted me to write this post, an idea that I'd been mulling for some time.

Monday, March 23, 2009

BusinessExchange Wins Min's Best of the Web Award

Congratulations to the folks over at BusinessExchange, a site built by McGraw-Hill and which I perceive as the work-in-process web reincarnation of BusinessWeek, which recently won a Min's Best of the Web Award in the category of Best New Site (B2B).

Excerpt:
In creating a personal information manager with social media chops, BusinessWeek mashed up two of the greatest strengths of digital technology into one of the most innovative online products of the year ...

BusinessExchange not only aggregates some of the best material from around the Internet but shows you who else is most interested in the topic. Every topic page becomes a niche publication and social network filled with blogs, articles and like-minded researchers whose research you can explore as well. The site pulls in bloggers from Federated Media’s star stable (Henry Blodgett, John Battelle, et. al.) who become collector/contributors of topic pages ...

Business Exchange carves a new path for branded media as a host for research and information sharing.
I've blogged previously on BusinessExchange, which is based on MarkLogic Server, here. I'm also a fairly frequent user of the service as well.

Finally, I should note that they recently launched integration with both LinkedIn (for profile information) and Twitter (for update communication). Twitter users can follow BusinessExchange here.

Friday, March 20, 2009

jetBlue: The CEO's Guide to Jetting

I first noticed this jetBlue campaign a few weeks back, in print, with a full-page newspaper ad that went something like:
To all industry captains, moguls, investment bankers, derivative traders, CEOs, chairmen, ... who used to fly on the corporate jet, we have two words for you:

WELCOME ABOARD.
Pretty funny, I thought. (See full ad image at bottom of the post.)

Then today, a colleague pointed me to these related viral videos that I think are quite good. And bear in mind, I think it's easy to get viral videos badly wrong -- consider IBM's The Pitch Meeting as a painful example of a would-be viral DB2 video gone astray.

Here's the jetBlue series, in three parts:

Part I



Part II



Part III



Enjoy.

Friday, March 13, 2009

Tweetstream from Columbia Law Conference on Google Books

Columbia Law School is hosting a conference today, The Google Books Settlement: What Will it Mean for the Long Term, which is generating a healthy Tweetstream (also known as a Twitterstream) that you may find of interest.

The conference hashtag is #gbslaw, and you can find the Tweetstream here.

Mark Logic's own Chris Welch, a senior consultant in our Information & Media division, is at the event and tweeting under the (dubious) name "guppywon." You can see his tweets here.

I should mention that we are holding a webinar next week, Publishing Industry vs. Google: New Opportunities and Strategies from the Landmark Settlement, featuring Bill Rosenblatt of Giant Steps Media.

My other posts on this topic are:

Thursday, March 12, 2009

EMC Mulling Search Acquisition?

Check out this article, Is EMC Searching for Search, on Bnet which speculates that EMC may be pondering an acquisition in the search space. Candidates cited include Autonomy, Endeca, Recommind, and France-based Exalead.

In my opinion, it's a reasonable guess. EMC/Documentum has embedded third-party search engines for some time and, as far as I understand things, it's not been an easy road. I believe they first embedded Verity, had problems, tore it out and moved to Fast, and then found themselves in a frying-pan/fire situation. Again, I don't have the details, and it's more perception / hearsay than anything else, but I think it's safe to say Documentum's had a checkered history with third-party, OEMed search.

Lest you think EMC's 2007 x-Hive acquisition will help alleviate search problems, it doesn't. To my knowledge the x-Hive (now renamed xDB) XML server bolts in Lucene for search. I suppose they could always move to Lucene if (1) they were just looking for search and (2) they were looking to solve a technical problem as opposed to buying a business to fill a strategic hole in their product line.

Excerpt:

The enterprise search market grew by approximately 22% in 2008, according to IDC analyst Sue Feldman, not even pausing for breath in the fourth quarter.

Feldman agreed it’s “a good guess” that EMC would want in on that.

There’s more than simply growth that should attract EMC’s attention. Feldman told me that vendors are starting to gain traction with a concept called “unified access” — the ability to pull content from a variety of sources into a single platform.

...

EMC is known as a nuts and bolts storage infrastructure company, but it sees itself as an information management company, building in this direction since its 2003 acquisition of document management vendor Documentum and, more recently, data deduplication vendor Avamar. Without an enterprise search or unified access component, it risks being left with a big hole in its information management strategy.

Wednesday, March 11, 2009

Sixth Sense Wearable, "Gestural" Interface. Cool.

Check out this TED speech which demos Sixth Sense, a wearable gestural computing interface that augments the physical world with digital information, from Patti Maes at the MIT Media Lab.

The device has both a camera and a projector. It uses the camera to figure out what you are doing (e.g., reading a newspaper, holding a bottle of wine) and then uses the projector to display additional, contextually relevant information (e.g., providing additional real-time updates on a news story, displaying the Parker points for a bottle of wine).

The coolest part is that your hands interact with the projected image to drive the interface.

The funnest example in the demo is the device projecting a tagcloud onto a person (e.g., from his or her blog) after the camera recognizes who it is (at time 6:47).

Tuesday, March 10, 2009

Top Resources for Understanding The Google Book Settlement

We've had major interest in our upcoming webinar on the Google Book Settlement and unprecedented downloads of the related white paper, Google's Settlement with the Publishing Industry: Opportunites and Strategies for Publishers, written by Bill Rosenblatt of Giant Steps Media and available for download without giving contact details here.

Given all the interest, I thought I'd share a list of what I consider the top resources for helping publishers and other information industry stakeholders understand the Google Book Settlement, its implications, and the opportunities and threats associated with it.
  • The Google Book Settlement microsite, which includes the full settlement in HTML or PDF format. Note that the full settlement consists of 16 documents with about 320 pages of text, hence explaining the need for summarization and analysis.
I should also note that Columbia Law School is holding a high-firepower, one-day conference on March 13, 2009 entitled The Google Books Settlement: What Will It Mean for the Long Term?

Finally, for those more inclined to click through a presentation than surf through the above links, below please find this excellent 69-slide summary by librarian Lauren Pressley.


If you know of other excellent resources (not just yet-another-summary articles) please share them with me by mail or blog comment, and I will attempt to update this post to add them.

(Thanks to Jill O'Neill at NFAIS for pointing to me to some of the links I added in the second revision of this post.)

Friday, March 06, 2009

Top 10 Reasons to Attend the Upcoming Mark Logic User Conference

The 2009 Mark Logic User Conference is coming soon: May 12-14, 2009 in San Francisco at the wonderful Intercontinental Hotel.

Since we know this isn't going to be the easiest year -- to put it nicely -- to either hold or attend conferences, we're doing everything we can to make it as attractive and easy as possible for you to attend.
  • Discounted airfares through our partners jetBlue and United. Seats are currently available for less than $300 round-trip from either New York City or Washington, DC and for less than $1000 from Europe.
Our marketing team recently put together a top ten list of reasons why you should attend. Here it is:

10. Mingle with hundreds of XML and XQuery developers who are building powerful content applications for companies like JetBlue, O'Reilly Media, McGraw-Hill and Boeing.

9. Network 1-on-1 with the Mark Logic product development team and ask those burning questions that only they can answer.

8. Hear Mark Logic CEO Dave Kellogg's thoughts on the coming XML tornado and how it has the potential to fundamentally transform how businesses think about, create, and use information.

7. Get a sneak preview of new application services in MarkLogic Server 4.1 in a special pre-release training session, taught by XForms master Micah Dubinko.

6. Learn how leading analysts Whit Andrews of Gartner and Stephen Arnold of Beyond Search, see the future of information access unfolding.

5. Get practical, technical advice, tips and best practices learned from live deployments from our own developers and professional services team.

4. Hear best-selling author James Surowiecki expand on the simple, yet revolutionary, business idea from his book "The Wisdom of Crowds." (First 50 people to register receive a signed copy of his book!)

3. Find out what's on the mind of our illustrious founder, Christopher Lindblad, in our annual fireside chat.

2. Meet our partners and learn how they can help you get the most from your deployment of MarkLogic Server.

1. C'mon - do you really need another reason? Register now
More detailed session information is available here.

Wednesday, March 04, 2009

Introducing Tenaya Capital

See this Wall Street Journal story, entitled Lehman to Spin Off Venture-Capital Arm, which announces the formation of Tenaya Capital, a new independent venture capital firm spun off from Lehman Brothers, and formerly known as Lehman Brothers Venture Capital.

Excerpt:
Tenaya, which derives its name from a lake in Yosemite National Park, will be owned by its five existing partners led by Thomas Banahan, Lehman's former global head of venture capital.
The new entity has $750M under management and approximately 45 portfolio companies.

While Sequoia Capital is the lead investor in Mark Logic, our second (and currently only other) investor was Lehman Brothers and the former head of Lehman's venture arm, Tom Banahan, is on our board of directors.

Hopefully the successful spin-out allays any concerns that I know a few people had related to Lehman's previous holding in Mark Logic. That equity position in now held by Tenaya (I recently signed the share certificates myself) and we continue to work with the exact same people with whom we have always worked. From our viewpoint the only thing that's changed is the URL of the website and the sign on the door.

The press release announcing Tenaya is here. Congratulations to Tom Banahan and his partners on the spin-out.

The Empire Strikes Back: Oracle on DB2 pureXML

Oracle recently published this analysis of IBM DB2 pureXML, marking their first direct marketing counter-attack on IBM, who seems to have had a pretty easy time building differentiation around their XML market focus and product capabilities.

Among the big-three RDBMS oligopolists, IBM has clearly done the most marketing around XML, including the native XML databases blog by Conor O'Mahoney, program director for DB2 product marketing, as well as numerous other tools including the pureXML Redbook. So, in my humble opinion, IBM seems the most committed to shoe-horning XML into a relational database.

But at Mark Logic we dare to ask why. That is, given a large corpus of XML, the question shouldn't be can you stuff it into your relational database, but indeed should you?

(As I like to say: you can put your toaster in the refrigerator, but should you?)

While there are admittedly pro's and con's to XML storage in an RDBMS (and I myself would use RDBMS XML datatypes for certain applications), at Mark Logic we believe there are many applications where an XML server, dare I say a "pure" XML server (in the descriptive sense), is the best solution.

And perhaps the best way to understand the weaknesses of storing XML in an RDBMS is to let the competing RDBMS vendors explain the problems with each others' solutions.

Here's what Oracle has to say about DB2. I've embedded the document via Scribd, below.

Oracle on DB2 pureXML

Tuesday, March 03, 2009

Yipee! I've Been Elected to the SIIA Content Division Board

I just saw a mail from the SIIA with the results of the board elections for the SIIA Content Division. I'm happy to report that I have been elected. I am looking forward to serving and would like to thank those who supported my candidacy.

Here's an excerpt from the mail reporting the results:
The newly elected – and re-elected – board members serving two-year terms (2009 and 2010) are:
  • John Blossom, Shore Communications Inc.
  • David Kellogg, Mark Logic Corp.
  • Scott Livingston, LexisNexis Group
  • Edward Colleran, Copyright Clearance Center
  • Jeffrey Massa, YellowBrix, Inc.
  • Robin Neidorf, Free Pint Limited
  • Larry Schwartz, Newstex, LLC
  • Webb Shaw, J.J. Keller & Associates, Inc.
  • Amiad Solomon, Peer39
  • Keith White, Congressional Quarterly

These members will join eleven existing board members who will be serving the second year of their two-year terms. These board members are:
  • Christopher Brown, Pearson Curriculum Group
  • Daniel Duncan, The McGraw-Hill Companies
  • Andrew Elston, iCopyright
  • Hal Espo, Contextual Connections, LLC
  • Barry Graubart, Alacra
  • Kathleen Greenler Sexton, Business and Legal Reports Publishing
  • Darrell Gunter, Collexis Holdings, Inc.
  • Peter Jackson, Thomson Corporation
  • Michael Marchesano, The Jordan, Edmiston Group, Inc.
  • Sara Ryan, LexisNexis
  • Kate Worlock, Outsell Inc.

Mark Logic CEO Blog Named CODiE Award Finalist

I'm thrilled to report that this blog has been named a finalist in the 2009 SIIA CODiE awards in the category of best corporate blog. The other finalists are:
Lest I fail to mention it, I'm also happy to report that MarkLogic Server was also named a finalist in the category of best database management system.

Semantic Technologies at Dow Jones

Matt Turner, a principal consultant in our Information and Media practice attended the recent New York Semantic Web Meetup and told me about this interesting presentation from Christine Connors, Global Director of Semantic Technology Solutions at Dow Jones. (First off, it's kinda cool that Dow Jones even has a director of semantic technology.)

Her presentation, entitled An Overview of Semantic Technologies at Dow Jones, follows:


MarkMail and ccBetty: Two Peas in a Pod

I just ran across this new service cc:Betty, which pretty much implements the idea we've had of creating a corporate version of MarkMail.

cc:Betty is a service that captures and organizes your email through adding one cc'ed recipient to every email you want organized. Betty then captures these emails and presumably provides you with a nice user interface for searching them, managing discussion threads, handling attachments, etc., all over the web and accessible from any browser.

Though I've not used the service, the concept strikes me as pretty similar to MarkMail. Lest that comparison not be obvious, let me remind you about MarkMail. MarkMail is an Internet service we've built atop MarkLogic Server that enables you to search 37M emails archived from open source development project mailing lists.

While cc:Betty pitches the concept, personal use, and an empty database, with MarkMail we demonstrate the technology with a rich database of email content, and focus on the content / knowledge contained in email group discussion lists.

(To be fair to Betty, you can presumably just add Betty to distribution lists and she'll capture the mails sent to them, which -- unless she has list intelligence -- could create some problems because you theoretically only want people to be able to search/find emails sent to lists to which they subscribe. And to be fair to MarkMail, we could easily create a personal service focused more on helping individuals manage their email than on helping companies extract knowledge from distribution lists.)

While we currently have no commercial (individual or enteprise) MarkMail offering, we do use MarkMail internally at Mark Logic for accessing information in our internal mailing lists, making it our most successful internal knowledge management system. Since we are quite active and happy users ourselves, we are currently investigating how to commercialize MarkMail.

So, if you are thinking of using cc:Betty in an enterprise context with large numbers of emails across a broad range of internal mailing lists, then you should give us a call as well. Specifically, please call Bill Veiga at 650-655-2335 or mail him at bill-dot-veiga-at-marklogic-dot-com.

Sunday, March 01, 2009

Gallows Humor: IsThisTheBottom.com

I was busy and traveling in London last week, so please excuse the hiatus in posting.

I'll re-start with a short post I found via Paul Kedrosky's Infectious Greed blog. For a quick, dry laugh check out IsThisTheBottom.com.