Brian Tuemmler

Kick Start your IG Program with Content Cleanup

Last Updated: August 18, 2019By


[glossary_exclude]Corporate and government entities continue to maintain the vast majority of their information as unstructured content. All the new privacy regulations are shining a lot of light on PII as structured data, but the unstructured office content is still where we document decisions, explore brilliant ideas, establish specifications, solidify agreements, and communicate with our most valuable assets—our customers.

Organizations often think about how to clean up their content, but few know how to get started.

These locations are also where we audit call center audio, investigate fraud, manage events, make backups, and test new websites. They are different in nature than what one could or should put into an enterprise content management (ECM) system. They are also where Christmas party photos are shared, drafts are abandoned, mistakes are made, and temporary holding places are kept forever.

Want to eliminate shared drives? Then you need to think long and hard about all the activities we use network (shared) drives for—it is not just where we create documents and presentations. We can, however, focus on cleaning them in a way that reduces risk, increases productivity, and helps legal and compliance response times.

Organizations often think about how to clean up their content, but few know how to get started. Not all content is valuable (or “sparks joy” as Marie Kondo says) in the same way for everyone in the organization. It is rare that organizations clean up content for the sole purpose of cleaning up. The effort is often a first step in more extensive, more strategic, Information Governance program, including:

  • Records management
  • Access requests for GDPR, CCPA
  • ECM site development or migrating to the cloud
  • Minimizing eDiscovery costs
  • Preparing for mergers or divestitures
  • Consolidating data centers


Numerous organizations have tried various levels of content cleanup; some manual and some automated; some with success while some could never muster the approvals to delete anything. Here are my five suggestions for optimizing your cleanup process:

  1. Build an indexed data lake and figure out priorities. These programs require that you have a comprehensive view of your unstructured data. How can you manage your information if you don’t know what you have? Think of an indexed data lake like your phone’s map app; it allows you granularity, selectable details, and is highly interactive. Data lakes exist for similar reasons, but they are limited in the content you can store in the lake. Indexing all content into a lake may not be realistic, but the more content you can get under control, the more of the above programs you can undertake, and the more data—and risk—you can minimize.
  1. Before you delete anything, preserve legal hold content. Check to see if your data lake indexing software credibly works to preserve litigation content.
  1. Act upon content in bulk. With a policy in place that states what you can and cannot store on network drives, you can then delete all temporary files, drafts, duplicates, Christmas photos, old versions, obsolete software, and expired records, without needing approvals before you press the delete button. Further, consider working with business units and applying access controls to files that are widely accessible by all users on the network, specifically files that contain PII.
  1. Put things away. If you find databases, applications, utilities, or web content, consider that they require different access, performance, retention, and most importantly, security than most of your other content. A cleanup process should include putting content where it can be protected well and perform efficiently. Also, lock down files that should be kept secure based on your existing security classification strategy.
  1. Evaluate the larger governance opportunities. If you are going to have a full view into all data, you should consider classifying records, personal or regulatory responsive data, security-centric data, and so on, not just garbage “ROT” files.

Information privacy, security, and compliance, from an Information Governance perspective, is a big driver for cleaning content, and an excellent way to garner support. Approach it like the strategic catalyst it is.[/glossary_exclude]

recent posts

About the Author: Brian Starck