Having access to more data is a good thing. Companies that now deal with what is being called big data have mountains of data to pore over and study in order to learn more about themselves, their customers, and their competitors. Ideally, they will be able to sift through that data and glean a few gold nuggets that lead to real-world solutions. One obstacle to that ideal might be data governance.
Data governance is a very necessary aspect of information – especially as the amount of data organizations manage expands. Imagine if some of the information in your vast amount of big data contained sensitive customer records, government documents requiring clearance, or corporate secrets that only a few should know. In all of these situations, data governance can ensure that the important information reaches the right individuals or groups and restricts access to all others.
What Data Governance Is
According to Robert S. Seiner, data governance is, “the execution and enforcement of authority over the management of data and data-related resources.” Simply put, an organization must have some type of policy in place to determine who has access to what information and how that information should be managed.
In terms of data security, some companies operate on an honor system where employees are taught what data they need to get and how they manage it. On the other side of the spectrum are companies that rely on non-disclosure agreements and all sorts of legal and technical safeguards. While these policies may help with security, they do not help make the large chunks of big data that companies deal with today more approachable. Governing this data requires a more sophisticated technique.
The Big Data Problem
Because big data deals with massive amounts of information, the old methods of data governance may not necessarily be applicable. In the old way of thinking, as Forrester analyst Michele Goetz explains, data governance is “freedom from risk”. By controlling the ingress and egress of data, organizations also control the amount of risk they are willing to endure. While sometimes necessary, it takes away some of the freedom to grow and experiment that an open data system would allow.
The new thinking for data governance is that businesses need agility and rapid access to all of this big data they have at their disposal. Old permission systems and bureaucracy slow down innovation, which is something they cannot afford if they wish to stay competitive. Businesses in the big data age must find ways to develop policies that enable creativity and exploration of data, while still maintaining a reasonable level of security.
One technical solution is to employ some type of information discovery tool that is specifically designed for big data. These companies may use Hadoop in conjunction with any number of tools that make it easier to analyze big data and get just the right data they need. When employees are trained to use these tools correctly, they will only deal with the data that relates to their jobs and can safely ignore the rest.
More than ever before, it may be necessary for organizations to rethink their policies on data access. They will have to navigate the narrow path between enforcing data security and encouraging the free flow of knowledge that can ultimately stimulate growth and accelerate the path to success.
Data and Information Governance Solutions
Any true solution to governing big data must address the aforementioned problems without compromising security or the free discovery and analysis of data. The following are some commercial solutions to the issue of big data governance.
- IBM’s Information Lifecycle Governance – Combining technology it acquired from PSS Systems in 2010 with its own take on data governance, IBM provides systems for enterprise discovery management, retention management, and defensible disposal of data.
- EMC’s Master Data Management – EMC’s MDM services help companies develop roadmaps and strategies for data governance. With it they can develop roles, policies, responsibilities, processes, procedures, and organizational structure for data management.
- Autonomy’s Information Management – Automony, a subsidiary of HP, provides a wide range of products dealing with everything from eDiscovery to regulatory and compliance. Among their products is an information governance toolset in the Autonomy Legal Performance Suite.
Companies like Sears have adopted Hadoop in the hopes of catching up to online competitors like Amazon that have been chipping away at retail profits. Rather than focusing on saving storage space and compartmentalizing data, the objective now seems to be geared toward consolidating all of the data into one big heap and then analyzing it.
Managing this type of data is considerably different than the management of neatly divided files and structured data. A policy cannot be file-based, for example, when the file may not have a recognizable designation. Instead the users who are actually performing the analysis must have some type of filtration tailored to their needs and even their level of clearance or permissions.
This data governance system should also help its users determine what information is valuable to the organization and what information can be discarded or archived. Finally, it must give the users the functionality to recognize the information that presents a risk to keep or discard for legal or regulatory reasons.
Keeping it Simple
The images of 90s companies under federal investigation dumping mounds of shredded documents out the window may be over, but those mounds of information still exist, albeit in digital form. And the risk some of those documents may present is still real. Nevertheless, much of the data has tremendous value, and we now have the tools to unlock that value and benefit significantly from its discovery. Data governance need not be overly complex to the point where it is so confusing that it fails to live up to its intended purpose. As big data analysis solutions become even better at sorting through data and delivering natural answers relevant to the questions companies actually have, governance of that data will be become simpler and more manageable.