Semantics – A Conversation Starter

I spent some time the last few weeks engaged in all manner of speaker-to-audience related events, and on both sides of the podium.  Starting out at St. Joseph’s University in Philadelphia (speaker), then to the MarkLogic World 2014 event (mostly as listener/obsessive tweeter), then over to Salt Lake City for two customer visits (conversations – no podiums there) and finally to London for still more customer visits (with a funny accent – mine according to my UK colleagues). Given MarkLogic’s release last year of MarkLogic Server 7, it is not surprising that the theme of most conversations has been around Semantics since then.

During this time I can say wholeheartedly that each and every question, answer, thought-exercise and conversation has been compelling.  For technologists, there’s nothing quite like the potential of an emerging technology trend, especially so for those that are not quite new conceptually but emerge as “new” simply because complementary technology and the world at large have become ready to embrace the new capability.

However, as interesting as those conversations have been, it’s not so much the conversations between people that has me excited but what represents the mainstreaming of real conversations between people and machines.

OK so before we get all philosophical about what represents a “conversation” and whether or not we have already been having conversations with computers (in some cases yes), I’ll anchor this in a simple anecdote as represented by a personal epiphany.  It actually occurred almost two months ago, before this recent speaking/listening frenzy, when I heard one of my colleagues, Stephen Buxton, mention a single phrase as he anthropomorphized a search application.

“Here’s what I know.”

It was during a demonstration which showed how semantics enhances the search process. The money item was the display of an info-box that accompanied a search request.  For those of you who may not be familiar with the “info-box” term, you likely have seen the capability on display while doing a Google search. If you happen to search for someone famous or a well known topic, you will certainly get search results back, containing all of the URLs that match the search criteria. However in the case of the well known person or topic, you might also find an assembled collection of facts about the search term at the top or side of the page.  That collection of facts is an example of the aforementioned info-box and what is unique about it is that unlike the other search results, the info-box doesn’t map to a specific found URL, per se.  What it does represent however, is information about the search terms as known by the query engine in question (in this case Google). In other words it’s the machine saying “here’s what I know.” Or if we’re being complete, “Here are the search results you asked for, but here’s also what I know on the topic. Does any of it interest you?”

And perhaps that is the real dividing line between search and semantic search – the conversation.  Imagine walking into a library (yes they still exist) and asking the librarian where you can find a book about genetics. Given that it’s an expansive subject, the search might go in many different directions. Now imagine in our hypothetical situation that our librarian happens to have read three books on the topic ranging from “Automated DNA Sequencing and Analysis” by Adams, Fields and Venter (1994), Matt Ridley’s “Genome” (1999), as well as “The Immortal Life of Henrietta Lacks” by Rebecca Skloot (2010).  Now it’s true that all of these have to do with genetics in some way but the topic space of each, not to mention timelines of each publication, varies widely.  So the librarian then volunteers a bit of information and says:

“OK so I have a number of things, depending on where you want to go, topics-wise. J. Craig Venter is considered the most widely-known figure of modern day genetics, having successfully sequenced the human genome in 1999, three years ahead of when he thought it would be done. He’s mentioned in nearly every modern book on the topic and wrote a few of his own.  Then there’s Matt Ridley who wrote a NY Times best-seller that does a great job of telling the story of the human genome in layman’s terms.  And then there’s Henrietta Lacks, who was born in 1920, and died of cancer at a very young age (31). Her cancer cells had been found to be immortal, the only such cells observed, so her genes are of particular interest. I recently read an interesting biography about her.”

So that’s an example of a response from a very knowledgeable person, suggesting some relevant topics associated with your query.  It’s also chock full of relevant facts, not unlike the info-box.  And it’s pretty clear that such an initial response from an authoritative source goes a long way toward establishing a more relevant context for discovery.  For instance you might not have known about the Henrietta Lacks story and discovered a whole sub-topic of interest that you hadn’t previously considered when you initially performed a search.

And that’s what a conversation does. It creates a back-and-forth which quickly enriches topics of interest with context and scope.  When doing any type of discovery, this is an incredibly valuable enabler.  And while it’s true that the conversations with humans are much more fluid (and perhaps more entertaining – at least for now), it’s the ability of machines to scale in terms of capacity and processing speed that makes those potential conversations compelling for search and discovery.

So what is the state of things in technology, outside of the info-box example?  It appears as if the topic of semantics either has or is about to take off into the mainstream. If you’re building applications, the technology options are proliferating, from triple stores to graph databases to semantic enrichment tools, to name just a few.  And then there’s linked open data, an ever growing collection of W3C standard machine-readable knowledge ready to be leveraged by any creative developer looking to build the next great semantic application.

So if you’re looking to explore some new technologies and want some advice as to what might be most relevant in the very near future, I suggest that you consider thinking through the various options, and then perhaps having a conversation … with a machine.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s