Quantcast
Viewing latest article 3
Browse Latest Browse All 7

It’s data, Jim, but not as we know it – Part 2: A picture worth a thousand hours

Whatever we call them, the non-traditional types of data that I discussed in my last post can greatly extend the utility of "traditional", record-based data warehouses.

 

Consider the following trivial example. Suppose a convenience grocery retailer loads sales data to its data warehouse in near-real time. Examination of the data as it streams into the warehouse and is integrated with the other data captured there reveals that sales for the Nowheresville store are well down - against both forecast sales and prior-year sales.

 

By itself, this information can tell an analyst at the retailer that there is an issue, but not what the root-cause of the issue is: maybe a new competitor has opened for business nearby; possibly the store is in need of refurbishment and has seen a long, gradual decline in sales; perhaps there is an availability problem with a key line (milk or bread) and shoppers are abandoning their shopping baskets at the point that they discover this fact; maybe the store has the wrong assortment for the demographic that it serves, etc., etc., etc. A well-designed and comprehensive "traditional" data warehouse should enable our hero – after further analysis – to establish the root cause for the decline in sales in these scenarios. But there are other scenarios in which a data warehouse that captures only the record-based data captured from the transactional systems that run the business on a day-to-day basis may be less useful.

 

Suppose that the problem is a temporary issue with car parking. There are two convenience stores close to where we live and as we are a moderately disorganized family we frequent both of them on a disappointingly routine basis. Neither is quite close enough to walk to and so for us, buying milk, bread or any of the other daily essentials that we have inexplicably managed to overlook again on our big weekly shop involves getting in the car – and parking it on arrival. If we can't park at shop A (a Teradata customer, incidentally) then we generally head to shop B (also a Teradata customer). I see lots of other potential customers exhibiting very similar behaviour; either that, or they just like driving around.

 

So now suppose the analyst at our convenience retailer also has a feed of still images – one every hour, say - from the CCTV cameras at the store available to him alongside the record-based data recording the facts of the sale of a carton of milk, a tube of toothpaste and so on. Examination of these images may reveal that the car park at the Nowheresville branch is currently being re-surfaced by contractors and that there is – literally – nowhere to park. All by itself, this simple fact probably explains the dip in the store's sales; and if the decline in sales was sharp and coincided with the start of the work, then we can probably safely conclude our analysis without further investigation and without going to the (considerable) trouble of reviewing the store assortment, availability levels, etc., etc.

 

Of course, this is a trivial example; our intrepid analyst could have achieved a similar result by calling the store's duty manager to ask him or her if there were any exceptional circumstances currently affecting the store - and without going to the time and expense of loading the image data to the data warehouse. The more interesting examples of the exploitation of these non-traditional data types typically involve "fact extraction" from large volumes of multimedia data (digitized patient medical records, the free text fields that garage mechanics use to report the issues that they encounter when servicing cars, etc., etc.) to create "square" meta-data that can be easily integrated with traditional, "square" record-based data in complex queries that run against the resulting, large, integrated data-sets. In very many cases we will still need to retain the original multimedia data alongside the extracted fact data to support investigations that require access to the "raw" data and because enhanced fact extraction techniques may become available to us as we learn more about exploiting the multimedia data. But in most cases, these data will be relatively "cool" (infrequently accessed) compared with the extracted facts. And they are not - repeat not - unstructured.

 

The web is the database?
On our recent tour of the EMEA region, Teradata's CTO Stephen Brobst provided a fascinating example of the use of these fact-extraction techniques when he discussed how leading organizations are starting to use data harvested from the web to better understand how we consumers feel about the products and services that we are consuming – and whether we like them well enough to recommend them to our friends or not.

 

As Stephen explains in his presentation, in the US it is already the case that the majority of adults in all age ranges up to age 55 now conduct their primary research on which products and services to buy on-line, principally through reviews and recommendations from specialist and social networking web-sites and from trawling the blogosphere. In many cases this information is either already in the public domain, or users can be induced to share it with organizations that appropriately value the relationship that is created or reinforced with the consumer as a result.

 

From a marketing perspective, this is 24-carat gold information: unvarnished and evolving feedback in the form of narrative text posted by users on web-sites and describing their experiences; harvested for free, or at a cost that is greatly reduced compared with the cost of traditional market research; and increasingly free of the selection and sampling effects that can blight traditional research. The extracted data – cold, hard facts and statistics – are analyzed by leading organizations alongside sales data and all of the other traditional data that originates from inside the organization. Apple Corporation, for example, knows precisely which iPhone features users value the most and has a pretty good idea about the functionality that consumers would like to see enhanced - and how much they would be willing to pay for those improvements. It may be no coincidence that Apple shifted over a million of the new iPhone 3GS models - in the teeth of a fierce recession, remember - within only three days of the product's launch.

 

(Incidentally, if you're reading or analysing this, Mr. Jobs, I really like the spotlight search functionality introduced in release 3.0 of the iPhone software, but I would like a free upgrade to the new 3GS model with the built-in compass even better. And I'll happily tell you what I think about my MacBook if you could see your way to upgrading my keyboard to the oh-so-cool-glow-in-the-dark version.)

 

Alas, I sense that I stand about as much chance of banishing "unstructured data" from the vocabulary of technical marketing as I do of relegating "paradigm" to the relative obscurity that it enjoyed until only recently, or of snagging a free iPhone upgrade. But it beats working for a living and it's good for my blood pressure. And it has kept me away from the Apple Store - and so out of trouble with Mrs W – for the last several hours.

 

Martin Willcox

The post It’s data, Jim, but not as we know it – Part 2: A picture worth a thousand hours appeared first on International Blog.


Viewing latest article 3
Browse Latest Browse All 7

Trending Articles