Structured Versus Unstructured Data – Using SharePoint To Get Value From Your Business’ Information
Guest Author: James Love
Chronicles of a Chronic E-Junkie
There have been one or two instances in my implementation of SharePoint for my current employer where I have seen it as a failure. The end users love what I’ve created, they have separate areas in which to store their documents which are relevant to specific areas of their work. However I feel that something special has been lost in their migration from storing all their work locally and on file shares to using SharePoint.
You see from my eyes, what I had created was pretty much the same as their file shares, but all with document libraries and the odd custom field here and there. The information they were storing was largely all locked up in Word and Excel documents. The information was used to organise events which were attended or hosted by the company (each event organised with customised Workspaces). Thus, there was some extremely strategically-valuable data locked in these files. Data such as most frequently used, cheapest or most expensive venues, frequently hired equipment, cost of that equipment, the different suppliers used, delegate lists, slide decks and many others was all locked in what was now a database of Word and Excel documents.
This is where I had to explain the differences between structured and unstructured data to my users, to demonstrate the true power of how SharePoint can be properly used to be able to extract this knowledge and make more informed decisions.
I decided to step away from their scenario and take them back to University (appropriate for my organisation, as the vast majority are academics). Back then we all kept hundreds of pages of notes from lectures and tutorials, typically in one or more notebooks and most likely not filed in any particular way. To be able to get all the information for a particular topic, or even compare the volume of notes made for one topic against another, required very labour intensive sorting and extracting the information through our bundle of notes. What we then ended up doing was creating note-cards or something similar, where we have a 6×4 inch card used to keep a quick note on a specific aspect of a particular topic. We probably then bundled these into groups according to the topic, then again grouped these groups by module.
Make any sense? Possibly, some of us might have been as organised as the latter but I know myself I was as disorganised as the former in my younger days. This is where I make the distinction between structured and unstructured data. The Unstructured data is the piles of notes all in one (or maybe a few!) notebooks. Extracting value is a time consuming process of extraction. The structured variation of our data Is where everything is spread out onto the little notecards then grouped. Only information of true value we need is stored on these cards, and the information about what is contained on the card (metadata, anyone?) is decided by grouping our little snippets of information. We can simply and easily get straight to the information of value we need to as we can narrow down through our grouping system, and the last manual task to retrieve our information is a case of simply flipping through the individual cards. Wouldn’t it be great if a Search Engine could pull out exactly what we needed from these cards? :)
Taking this analogy back into our SharePoint world, our organisation (and most likely, many others) has mountains of tiny little snippets of valuable information buried deep inside possibly large Word and Excel documents. Some of these probably have cryptic naming conventions, or have misused Document Versioning within document libraries and created many copies of the same document, each with different editions, making queries such as "the most cost effective external catering company last year", or "the most cost effective venue for 120 delegates in the north east" very laborious. When this mess appears in a SharePoint environment, to me it’s a failure.
To rectify this failure, I need to describe the differences between Structured Data and Unstructured Data, how Structured Data can be stored in SharePoint, and ultimately, how important the use of Structured Data is within an organisation. The hard part for you, the implemented/designer/architect/consultant/[insert current "SharePoint dude" business term here] is finding out what your organisation requires in terms of reporting back on valuable information, looking in detail at what information assets the organisation deals with, and explaining to the users why going from repeating tables in Word Documents to custom lists has all the benefits that it has.
Guest Author: James Love
Chronicles of a Chronic E-Junkie
James Love works as an Information Officer for a small non-profit organisation in York, UK. Whilst developing solutions for the company’s intranet environment, he also spends time looking after IT operations and strategy. As well as web development & design, James has a keen interest in Information Architecture best practices for the corporate environment. He is a regular attendee of Sharepoint User Group UK events in Northern England.
- Structured Versus Unstructured Data - Using SharePoint To Get Value From Your Business' Information
- Structured Versus Unstructured Data - Part 2: The Long Filename Debate
Hi James
We have just completed a migration for a federation of schools and one had the most unstructured sharepoint i have ever seen but it worked for them, but not for us with the migration so from now on i am all in favour of structure, stucture and more structure, Good post
Dave
No, that’s going the wrong way… embrace the long-tail and semi-structured data and the value of the lesser used leaves!
Seriously, I’ve watched large organisations try and force a one-schema-fits-all thing for years, and all it does it homogenises the schema to one that really doesn’t provide the deep value anywhere, it’s become a dilution of what you really need and lacks the richness that both the top reports and bottom reports need.
Embracing semi-structured data is what you need to do, to allow for reporting against those lesser used leaves arbitrarily. I’m not suggesting that’s easy, but I propose that we take on that complexity and actually encourage our users to use the extensibility of SharePoint fully.
Hell, I’ve even set up a company to provide solutions to address exactly this issue. The value of semi-structured data is immense… don’t be afraid of it.
I’m glad to see you call out the difference unstructured and structured data. So much more business value can come from getting your unstructured data out and in a position where you can repurpose and re-use it. If anyone is interested, there is a product called SmartDocs (at http://www.thirtysix.net/) which integrates content management capabilities into Microsoft Word and SharePoint. It allows you to easily identify and re-use MS Word content from SharePoint and keep track of all the re-use relationships for change notification. Just FYI.
I find that the biggest challenge is teaching the Super-User which web part is the best fit for the data and task that they have at hand, depending upon the structure of their notes and thought process. Within a team we find both structured and unstructured thinkers, both with extensive knowledge to add to the mix. We have had great success with using wikis and pulling all the information together that way. Linking structured lists to the wiki notes (sometimes with multiple links or titles) makes it easy for everyone, no matter how they think to find their information. I find that Post-It notes are a big pain point and that when we start addressing what people are putting on the Post-Its and attaching to everything in sight, we start to see where we need to begin with Sharepoint.
Hi Kerry,
Your last point about Post-It notes is very relevant. You need to find out what information is being recorded/created and why, to be able to design a system that exploits or surfaces the value of that information in the appropriate manner.
The users may not necessarily need to change what they do or what content they create, but the system on which they create that content needs to provide value in the most efficient way for the consumers of that information.
James
Additionally, the system designed must of course meet the initial requirements and not be over-engineered, but when it comes to a pre-sales point of view (For those of us who are contractors), demonstrating the value that can be had with a proof-of-concept system will go a long way to securing your work.
Good example of a unstructured vs structured data story using the University model.
The important thing here isn’t technically how we accomplish structuring the data (metadata, content types, site columns vs list columns, custom lists, etc) but more that the users understand the benefit of adding additional structure to their content to make it more valuable.
Thank you for sharing your thoughts, I know I enjoyed reading it.
On a side note I wonder if it would be worthwhile coming up with several more examples of Structured vs Unstructured data that users could identify with.
Some other quick examples:
Songs in Music Albums
(They care about the music, but knowing the artist, track number, and album provides additional value.)
Cans in a Grocery Store
(Nutritional information, location of the cans, light vs normal, related items)
Books in a Library
Baseball Cards
The key with any of the ones we come up with is that it doesn’t deal with their subject matter directly. This way we avoid digressions, conflicts of opinion, or incorrectly assuming/assessing their subject matter/content.