internal-page-background-header.png

Big Content, Dark Data and Information Management

shutterstock_1030776988AIIM is a non-profit association covering image and information management topics. These are notes from the AIIM Roadshow held in London on June 20th, 2013.

The first keynote was by John Mancini (@jmancini77) who reminded us that AIIM started in 1943 because of the challenges inherent in managing microfilm, and has evolved since with the dominant technologies of the day. He talked about the rise of 'extreme volatility' in industry: for example, out of the Fortune 100 company list of 1977, there were 79 companies still on the list in 1984. But if you took the Fortune 100 list of 2005, only 25 were still there in 2012.

He didn't spell it out, but the accelerating rate of attrition means the list churn rate in 2020 could be 90%. It may not follow that every industry is equally affected, but if it did then few of the current technology top 10 will still be on the list. Wonder which one or two of Microsoft, Oracle, Apple, HP, Dell, IBM, Amazon, Google, or Intel will still be there?

A choice we (vendors and users) make about information management is whether we focus on the opportunities or on the risks/cost of non-compliance. One route takes you down the path of engagement and mining nuggets from the 2.5 quintillion bytes of data produced each day. The other takes you down the road of record-keeping, governance and fear-based decision making.

One of the current AIIM themes is that "Big Data" is not the same as "Big Content". If I follow correctly, big data is about using vast amounts of statistical information to help with e.g. fraud detection, sentiment analysis or social monitoring. It's what a supermarket is doing when it sends you coupons based on your purchasing over the previous two or three months. It's the data from all those sensors in an Internet of Things. Big content, on the other hand, deals with the even more unstructured world of social, video, images, audio and text data. The enablers for big data have been the cloud, Hadoop and NoSQL, whereas the enablers for big content have been semantic web and search technologies. It strikes me that data visualisation technologies are probably a common enabler for both. Of course it's still "data" in both cases.

The challenge to us making content management software is to go beyond document-centric content to find emergent value in huge volumes of unstructured information. That's accepted, but I wouldn't overlook the fact that today's enterprise is still typically struggling to extract value from its document analytics. First things first. But there's also a lot of exciting semantic web technology emerging, and our challenge is to harness the best of it (preferably open source) in an integrated manner.

On my way there I bumped into a friend who's working on a contract in London. How to move content between SharePoint and Salesforce.com is his problem of the moment. It was partly for his benefit that I attended a round table discussion entitled "Exploring Microsoft SharePoint". The audience were mainly SP users, many of them from the public sector, and there was a general sharing of wisdom. A show of hands indicated most were using SP2007; a few were still on SP2003 and those who'd made it to SP2010 were moving to SP2013. Most were using SP as a tool for building team, department or company-wide Intranets and as a way of replacing the network / shared drive. Success was mixed. Few did any information architecture (IA) work before implementation. Many said the project was led by Marketing (as in the communications team) with IT assistance, and delivered by external consultants. "Only customise as a last resort" was one pearl of wisdom, but the 'better' alternative of using third party plug-ins didn't seem all that successful either. One person said user authentication (i.e. working with Active Directory) was like an extra 0.5 project on top of the main SP project. I picked up many useful tips on tools for e.g. content migration but too detailed to go into here. The overall sentiment was resignation to a bad experience - I'd be devastated if a CogniDox user group spoke about us like that.

The day finished with another keynote - this time by Alan Pelz-Sharpe (@socialbizAlan) from the 451 research group. His topic was Big Data and in particular the notion of "Dark Data" which is the antithesis of big data. My take on what he means by that: if the IBM statistic is true that 90% of the world's data has been created in the past 2 years, that doesn't mean it is of value. There are servers full of useless and duplicated files (think about the storage of email attachments) and it is highly unlikely that this dark data will ever yield emergent value or insights. Don't be seduced by the mere volume of data. It reminded me of a blog I once read that said big data is really mashup data - it's the variety of the data sets that gives the insight. If that's true, then open data becomes even more important because there's a high probability you won't own all of the smartest data sets in the room.

The recent NSA/Prism news story rippled through the day's talks. Whether we are in the business of "information logistics" or "information governance", there was much quiet satisfaction in the idea that metadata is a word now used on the newspaper front pages and The Guardian even has a handy guide for readers. In the un-glamorous world of information management, maybe that's as good as it gets for us.

Value of a DMS for product development

Tags: content, Document Control, SharePoint