I’ve worked in data governance most of my career—even before we had the term “data governance” (DG). Originally, I suppose we called it commonsense. It’s not second nature to everyone, so the DG discipline was built and evolved. It combines business aspects and technical aspects, but heavy emphasis on the business. If the business rules are bad, no technology will save you.
Deciding that you need DG can come about in multiple ways. It might be through a decision for the company to be more data-driven and drive more value from your data using principles of infonomics. You’ve collected a lot of it over the years, so why not make it more valuable to you by driving insights from it or monetizing it?
It also might come about through trying to do a technical implementation, such as implementing a master data management (MDM) tool, and the tool vendor asked you what your DG strategy is to keep the data “mastered” after implementation.
Sometimes, those working on analytics find the data is inaccurate, so someone proposes DG. Many times, it comes about through a government directive (e.g., GDPR, CAN-SPAM), a compliance violation, or a regulatory ruling. Keep in mind businesses need key business drivers to launch a DG program.
A major aspect often emphasized in DG programs is data quality. It’s important to put data quality under your DG initiative. If you don›t, that›s when I often see people thinking of data quality as a one-time technical project. You might be implementing a data quality tool. That is a project. However, what goes into that tool are the rules under which you define data quality. Some of the rules could be technical, such as IT wanting some values to be stored a certain way to improve processing, but a lot of the rules need to come from the business. That collaboration of the business side and IT is critical. Those data quality rules can change over time, so DG programs must be evergreen to keep these rules current.
A lot of companies these days are investing in data lakes. Did you know that if all you do is create a data lake without using any DG, all you’ve really done is create a data swamp? Putting more data in a data lake is a great idea, but only if knowledge workers can find the data they need and they know what it means. Taking a more thoughtful and governed approach to it will give you something usable that you can drive value from. This is not a game where “whoever dies with the most data wins.”
Big Data or “little” data, you still have data that needs to be governed. Any data that you don’t govern is data that doesn’t reach the potential value it could have. A data field might make sense to someone in the moment they create it, but it doesn’t always make sense when you look at it further down the line. We need to know what data means, where it came from, under what rules it is governed, what we’re allowed to do with it, and what good quality data looks like, etc.
Do not think of cloud computing implementations as just a technical project, something that your IT department will do for you. One company had an IT department that acted in a vacuum in moving all the organization’s data to the cloud. There was a cost savings in it, but the problem is they wound up moving the data to the cloud where it was stored on a physical server in a country it was not allowed to be in. Major oops! That resulted in a slew of legal discussions and reinforced the need for collaboration early on and through the entire migration process. It would have saved a lot of heartache.
Many companies are investing in analytics. Analytics can be very powerful, but if your analytics are based on bad data, you have bad analytics. Often what happens is that the Data Scientists either think it’s their job to “fix” the data before they can use it, or they just start using poor quality data. So they’re jumping to conclusions about what it means. A good data person is all about getting the right data into the right hands and that means getting usable data to Data Scientists. Data Scientists shouldn’t have to “fix” data before they use it. I’ve watched some actually change data values because they think it is incorrect, when they have no basis to prove that.
I’ve seen some companies combine the data and analytics groups. Perhaps they create a single Chief Data Officer (CDO), or more often, a single Chief Analytics Officer without a corresponding CDO. Those are two separate skill sets. To be most effective, two separate people are needed. You need them working together at the same level in the company.
Another part of DG is awareness training. In a company, there are typically people highly involved with DG who go through training. However, there are many more people in the business, and you could argue that it’s everyone in the company, who need general DG awareness training. This isn’t extensive training, but you need people to understand what data is and what they’re allowed to do with it.
Everyone wants the right data, not just data. We need to recognize that we’re all in this together. We shouldn’t be hoarding data to ourselves or hoarding the rules to ourselves. No one person knows everything there is to know about all the data. Proper DG programs require that we work together through sharing and collaboration for a better overall result that benefits the business.