What is dark data? It sounds nefarious, but it’s really much more benign—data your organization owns but isn’t using. Virtually every company collects data that it ignores, but storing this dark data presents costs and risks that you don’t need.
How much dark data does your company own, and what should you do with it? You can delete it or find a use for it—you just shouldn’t retain dark data, unused, forever.
- Dark data is unused data collected, stored, and then ignored
- At least 80% of all data is dark data
- Dark data is costly to store and increases cybersecurity risks
- If uncovered and cleansed, dark data can be of value to an organization
What Is Dark Data?
Dark data is data that an organization collects, processes, and stores but doesn’t actively use. It’s an unused and untapped resource generated in the course of doing business.
Like all data a business collects, dark data is collected and stored on a company’s servers or in the cloud. Dark data is maintained like all the rest of your data but sits unused, neglected and forgotten—and it still takes up costly storage space.
Some companies knowingly retain dark data for compliance reasons. Even though the company doesn’t use the data in daily operations, retaining the data may be required to comply with governmental or industry regulations.
Most dark data, however, is not actively retained. It is typically collected alongside more valuable data but pushed to the side without any plan for its active use. Dark data may be any or all of the following:
To most companies, dark data has little or no perceived value. In many instances, the company doesn’t even know it exists.
Where Does Dark Data Come From?
Businesses typically collect dark data alongside data of more current value to a company. Sometimes the company collects specific data thinking it will use it in the future but doesn’t. Sometimes data is collected just because it can be collected, even though there’s no real use for it.
In many instances, dark data is swept up via automated systems. The data can come from a company’s own data collection efforts or be part of data shared or purchased from third parties. You might want a particular bit of data but have to buy a whole collection of data to get what you want. That unwanted data becomes dark data.
Much dark data comes from the automatic sensors that are part and parcel of the Internet of Things. IoT sensors are always on, collecting data about their environment, even if most of this data is unnecessary and unused.
How Much Dark Data Is There?
How much of the data collected today is dark data? It’s difficult to measure, but an IBM study estimated that 80% of all current data is dark data.
That estimate is backed up by Splunk’s State of Dark Data report, with research conducted by TRUE Global Intelligence. Researchers interviewed more than 1,300 IT leaders and business managers and found that 60% said that more than half of their data is dark data. Not surprisingly, 77% of these executives said that finding and capturing dark data should be a top priority for their organizations.
What Are the Costs and Risks of Storing Dark Data?
A Veritas study reveals that 52% of the average company’s data storage budget is spent on dark data. That’s right. You’re probably devoting half of your budget to store data you don’t use. That’s a huge waste of money.
Dark data also poses a security risk to your organization. Like all data, dark data is at constant risk of breach or cyberattack, but the situation is actually more acute with dark data. That’s because dark data is typically unnoticed on your part, which means it’s likely to not have the same safeguards as the data you use daily. If you don’t know it’s there, you can’t adequately protect it.
Some dark data can be potentially valuable to the criminal element. For example, it’s not uncommon for companies to collect sensitive information such as user passwords that aren’t actively used by the company, thus becoming dark data. While this data might not be of immediate value to the company, it’s of significant value to cybercriminals—and, since it’s typically less well protected than active data, it’s an attractive target to these malicious actors.
(The following video discusses the risks of retaining dark data.)
How Can You Uncover Your Organization’s Dark Data?
Most companies know that they store some amount of data that they don’t use, but they don’t know how much there is or, in many cases, where it exists.
To uncover your organization’s dark data, you need to conduct a dark data assessment. This is really an extension of a traditional data survey, focusing on how—or, in the case of dark data, if—each piece of data is used. The goal is to identify that data that is not actively being used.
Once you’ve uncovered your dark data, you then have two options. You can delete it, saving your company the cost of storing it, or you can find uses for that data. For many companies, the latter option is the most productive.
How Can You Better Use Dark Data?
The dark data held by your organization may contain information of significant value—if not to you, then to other parties willing to pay for it. Instead of simply rooting out and deleting unused data, it may be more beneficial to learn more about that data and how you can use it.
You may discover that the dark data you possess contains information that you’d otherwise spend considerable money trying to collect. This data, often deep data, may provide insight into your business or your customers that could prove valuable.
Realizing the true value of dark data takes work. Artificial intelligence can provide much of the necessary analysis. AI-powered data analysis solutions can often find nuggets in data that might otherwise go unnoticed via traditional reporting methods.
Of course, for dark data to be truly useful, it must be of proven quality. That means cleaning the data to remove or replace inaccurate, incomplete, duplicative, or improperly formatted records. That requires using a data quality solution, such as FirstEigen’s DataBuck, to process and validate the data itself. The cleansed data can then be analyzed via AI and machine learning to extract the full value contained within.
Let DataBuck Help Make Your Dark Data More Useful
When you want to get the full value from your firm’s dark data, turn to the experts at FirstEigen. Our DataBuck software is an autonomous data quality management solution that automates more than 70% of the data monitoring process. Using DataBuck ensures that your dark data is complete, accurate, and fully useable.
Contact FirstEigen today to learn about using DataBuck to validate dark data.
Check out these articles on Data Trustability, Observability, and Data Quality.
- 6 Key Data Quality Metrics You Should Be Tracking (https://firsteigen.com/blog/6-key-data-quality-metrics-you-should-be-tracking/)
- How to Scale Your Data Quality Operations with AI and ML (https://firsteigen.com/blog/how-to-scale-your-data-quality-operations-with-ai-and-ml/)
- 12 Things You Can Do to Improve Data Quality (https://firsteigen.com/blog/12-things-you-can-do-to-improve-data-quality/)
- How to Ensure Data Integrity During Cloud Migrations (https://firsteigen.com/blog/how-to-ensure-data-integrity-during-cloud-migrations/)