For nearly four decades, data analytics has been used by leading organizations to gain new insights and track emerging market trends. Now, in the era of Big Data, increasingly sophisticated analytics capabilities are being used to help guide and monitor Information Governance (IG) programs. There are four distinct types of analytics to explore. In order of increasing complexity, and value added, they are: descriptive, diagnostic, predictive, and prescriptive. These analytics will become more important—and more difficult—as we continue to produce unprecedented amounts of data every year.
Descriptive analytics tells you about information as it enters the system. Diagnostic analytics investigates larger data sets to understand past performance and tell you what has happened. Year-over-year, or month-to-month, data can be used to determine what will happen in the future: this is predictive analytics. Prescriptive analytics helps companies to determine what actions to take on these predictions based on a variety of potential futures.
Structured v. Unstructured Data
Data analytics relies on structured data, which is stored in relational databases. When computers are fed data, it fits into a defined model. All the data within the model is structured. Unstructured information, on the other hand, is basically everything else—email messages, word processing and spreadsheet files, scanned images, PDFs, etc. Unstructured information lacks detailed and organized metadata. Structured data is more easily managed, analyzed, and monetized because it has rich and easily processed metadata. For example, a column titled “Name” will correspond to the name of the person linked to the rest of the data in the row.
It may be unsurprising, but unstructured information is stored rather haphazardly. Every day, knowledge workers create documents and send emails to communicate with other knowledge workers. Our personally unique and inconsistent preferences for what we name our everyday office e-documents and where we save them makes for a labyrinth of data. Even the nature of the information within them is rather chaotic. Free-flowing sentences do not make sense to computers like databases full of 1s and 0s. As a result, analysis is more difficult, at least until the proper metadata fields are created and leveraged—then the benefits are astronomical.
Structured data is very useful for determining what is happening in the market or within your organization. However, relying on it will leave you missing the most important piece of the puzzle: why. Clearly, it is advantageous to know what is happening, but without the why it is impossible to act on.
Historically, data analytics has been an imperfect science of observation. Systems produce massive amounts of data, and data scientists correlate data points to determine trends. Unfortunately, correlation does not imply causation. Think about all the information that isn’t included. Behind every one of these data points are emails and instant messages formulating ideas, as well as documents describing the thoughts and processes involved. There is a treasure trove of information to be found in the crucial unstructured data.
Take, for example, a typical enterprise of 10,000 employees. On average, this organization will generate over 1 million email messages and nearly 250,000 IM chats every single day. During that same time, they will also create or update 100,000 e-documents. The problem is simply being able to corral and cull massive amounts of information into a useable data set. This is not an easy task, especially at scale. The challenge only increases with the production of more data. Not only are established technologies not designed to cope with this type of information, they’re also unable to function at such high volumes.
IG is key
Without Information Governance, all this relevant information remains hidden in desktop computers, file shares, content management systems, and mailboxes. This underutilized or unknown information is referred to as dark data. Not only does understanding and controlling this information add value to your analytics program, it also reduces risk across the enterprise.
To control your information, you must own your information; when you own your information, you can utilize your information. This is much harder with unstructured information because most organizations have environments full of standalone, siloed solutions that each solve their own designated issue. This is fine from a business perspective, but a nightmare to manage for RIM and IG professionals.
A single document sent as an email attachment could be saved in a different location by every person included in the chain. Multiple copies of the same document make it difficult, if not impossible, to apply universal classification and retention policies. The same file may be stored and classified separately by legal, compliance, HR, and records––and no one would know! When this happens, organizations lose control and expose themselves to undue risks.
Organizations have petabytes of dark data haphazardly stored throughout their file shares and collaboration software. Much of this information is ROT (redundant, obsolete, or trivial). ROT data hogs storage and can slow down systems and search capabilities—thus hindering business function. ROT may also be stored in retired legacy systems. These legacy systems can be a thorn in the side of IG professionals because of the amount of dark data and ROT intermingled with important business records. Mass deletion is not possible, but neither is the status quo. Implementing modern, proactive IG strategies can be a daunting task that requires input from a number of sources.
So where do we begin?
Information Governance is not something to jump into all at once, but rather to ease into step by step. The best place to start is with file analysis. In short, file analysis is a series of system scans to index this dark data and bring it to light. A deep content inspection is conducted and metadata tags are inserted. File analysis can be performed on the metadata or the content of files, depending on the intricacy and accuracy needed. Metadata is information about who created the file, as well as where and when. Think about the information on the outside of an envelope being metadata, while the actual letter enclosed is the content. Performing file analysis helps determine what information is ROT and can be deleted, what can be utilized, and what needs to be preserved.
Leveraging newfound knowledge
Analytics can help improve compliance functions by tracking and mapping communications. A communication heat map allows an administrator to view “who is communicating with whom about what” at a high level, while also having the granularity to drill down into any of the conversations that may set off compliance triggers. Beyond monitoring communications, tools are able to determine if there are sensitive or potentially illegal communications being shared or stored in documents and files. Doing so proactively is an additional safeguard to keep an organization safe.
These communication maps are also valuable to Human Resources. Knowing who communicates with whom, and about what, helps determine who the big players are within an organization. Understanding who knows and owns important information and tracking communication trends can help assess leadership potential and award promotions based on merit. It can also alert management about potential negative sentiments and potential insider threats to an organization.
For legal teams, the data insight can drastically improve Early Case Assessment (ECA) abilities. Since the legal team knows what information the organization has and where it is stored, there’s no mad scramble to find information when litigation is initiated. Being able to analyze what information the organization holds saves time and effort in collection, while also providing a more accurate data set. It is not necessary to send massive amounts of information to outside counsel to be analyzed. When litigation does arise, the legal team can quickly and accurately determine what, if any, liability the organization faces and can make informed decisions on how to proceed. A process that used to take weeks or months can now be completed in hours or days.
The benefits for records management teams are substantial as well. The insights gained from analysis provide important information about which documents are business records and which are unnecessary to retain. This goes beyond typical records, too. Items that are historically not considered records, such as private information discussed in an email, now may be discoverable for litigation. This means record managers need to be able to identify this information and apply retention to it.
File analysis also makes compliance with new regulations (like the European Union privacy law, General Data Protection Regulation [GDPR], and ePrivacy) much easier. Many vendors have promised one-stop GDPR solutions, but the truth is there really is no such thing. GDPR is not something you can solve with a single-point solution, but rather something that requires the implementation of proper IG tools, techniques, and policies. Having in-depth knowledge of the information within an organization makes GDPR dSAR (digital subject access requests) a breeze.
If you’ve made it this far, then you care about analytics as much as I do—and see their utility in dealing with data. Information Governance is still a rather new concept in the business world and can be intimidating, but it is a game-changer. It is crucial to focus on the cross-functional benefits of IG in order to spur executives into action. The knowledge gained from analytics within IG helps create revenue and minimizes risk. It is a competitive advantage that will shape the next few decades in the corporate world, and I wouldn’t recommend being the laggard.