Definition of Data Integrity

Data Integrity is a process to ensure data is accurate and consistent over its lifecycle. Good data is invaluable to companies for planning – but only if the data is accurate.

Data Integrity typically refers to computer data. It can be applied more broadly, though, to any data collection.  Even a field technician who makes onsite repairs can collect data. Protocols can still be used to ensure data stays intact.

Threats to the Integrity of Data

There are a few ways that data can be damaged:

  • Damage in transit – Data can become damaged during transfer either to a storage device or over a network.
  • Hardware failure – Failure in a storage device or other computer hardware can cause corruption.
  • Configuration problems – A misconfiguration in a computing system, such as a software or security application, can damage data.
  • Human error – People make mistakes, and can accidentally damage data.
  • Deliberate breach – A person or software infiltrates a computer and changes data.  For example, some malware encrypts data and holds it hostage for payment.  A hacker might breach the system and make changes.

The Importance of Data Integrity

Critical business decisions depend on accurate data. As data collection increases, companies use it to measure effectiveness.

If data is damaged, any decisions based on that data are suspect.  For example, a business sets a tracking cookie on its web page. This cookie collects the number of page views and sign-ups by visitors. If the cookie is misconfigured, it might show an artificially high sign-up rate. The business might decide to spend less on marketing, leading to less traffic and fewer sign-ups.

importance of Data Integrity

Data integrity is crucial because it’s a window into the organization. If that data is damaged, it’s hard to see the details. Worse, manipulated data can lead to bad business decisions.

Aspects of Data Integrity

Who, what, when

Data should have the time, date, and identity of who recorded it. It could include a brief overview or might be a timestamp of access to a website.  It could be noted from a tech support agent.

Readability and Formatting

The data should be formatted and easy to read.  In the case of a tech support agent, use a standard format to document the ticket.  For a website, logging should be automatic and meaningful. A field technician should write legibly on forms, and consider transcribing them digitally.

Timely

Log data as it happens.Any delay in recording creates an opportunity for loss. Data should record as it is observed, without interpretations.

Original

Good data is kept in its original format, secured, and backed up. Create reports and interpretations using copies of the original data.  This helps reduce the chances of damaging the original.

Accurate

Make sure data follows protocols, and is free from errors. A tech support agent might log a script. A website logger might record data in a standard file type like XML. A field technician should complete all fields on a paper form.

How to ensure data integrity

Steps to Ensure the Integrity of Data

Validate input

Check input at the time it’s recorded. For example, a contact form on a website might screen for a valid email address.  Digital input can be automated, such as electronic forms that allow specific information.  Review paper forms and logs and correct any errors.

Input validation can also be used to block cyber attacks, such as SQL injection attempts.  This is one-way Data Integrity works together with data security.

Validate data

Once collected, the data is in a raw form. Validation checks the quality of the data to be correct, meaningful, and secure.  Automate digital validation by using scripts to filter and organize data. For paper data, transcribe notes into digital format.  Alternately, physical notes can be reviewed for errors.

Data validation can happen during transfer. For example, copying to a USB drive or downloading from the internet.  This checks to ensure the copy is identical to the original. Network protocols use error-checking, but it’s not foolproof.  Validation is an extra step to ensure integrity.

Make backups

A good backup creates a duplicate in a different location. Copying a folder onto a USB drive is one way to create a backup.  Storing files in the cloud is another.  Even data centers can create backups by mirroring content with a second data center.

Backups should include the original raw data. Reports can always be recreated from the original data.  Once lost, raw data is irreplaceable.

Implement access controls

Access to data should be based on a business needs. Restrict unauthorized users from access to data. For example, a tech support agent does not need access to client payment card data.

Even with physical paper data, access controls and management are essential. Sensitive physical records should be kept locked and secure.  Limiting access reduces the chances of corruption and loss.

Maintain an audit trail

An audit trail records access and usage of data.  For example, a database server might record the username, time, and date for each action in a database. Likewise, a library might keep a ledger of the names and dates of guests.

Audit trails are data and should follow the guidelines in this article. They aren’t typically used unless there’s a problem.  The audit trail can help identify the source of data loss. An audit trail might show a username and time stamp for access. This helps identify and stop the problem.

Database Integrity

In database theory, data integrity includes three main points:

  • Entity Integrity – Each table needs a unique primary key to distinguish one table from another.
  • Referential Integrity – Tables can refer to other tables using a foreign key.
  • Domain Integrity – The database has pre-set categories and values.  This is similar to screening input and reading reports.

With a database, data integrity works differently. This is useful for the inner workings of a database. Even so, the database is still part of an organization. The advice in this article will help your organization create policies on how to keep the database intact.

Data Security versus Data Integrity

Data Security is related to Data Integrity, but they are not the same thing.  Data Security refers to keeping data safe from unauthorized users.  It includes hardware solutions like firewalls and software solutions like authentication.  Data Security often goes hand-in-hand with preventing cyber attacks.

Data Integrity is a more broad application of policies and solutions to keep data pure and unmodified.  It can include Data Security to prevent unauthorized users from modifying data. But it also provides for measures to record, maintain, and preserve data in its original condition.

Conclusion

Data Integrity ensures keeping electronic data intact. After all, reports are only as good as the data they are based on. Data integrity can also apply to information outside the computer world. Whether it’s digital or printed, ensuring data integrity forms the base for good business decisions.