Data integrity

Data integrity is the accuracy, completeness, consistency, and trustworthiness of data throughout its lifecycle.

There's a chance data can be compromised every time it's replicated, transferred, or manipulated in any way.

  • Data replication is the process of storing data in multiple locations.

    Different people might not be using the same data for their findings, which can cause inconsistencies.

  • Data transfer is the process of copying data from a storage device to memory, or from one computer to another.

    You might end up with an incomplete data set.

  • Data manipulation process involves changing the data to make it more organized and easier to read.

    The process can compromise the efficiency.

Data can also be compromised through human error, viruses, malware, hacking, and system failures.

Alignment to business objective

  1. Clean data + alignment to business objective = accurate conclusions

  2. Alignment to business objective + additional data cleaning = accurate conclusions

  3. Alignment to business objective + newly discovered variables + constraints = accurate conclusions

Process

  1. Determine data integrity by assessing the overall accuracy, consistency, and completeness of the data.

  2. Connect objectives to data by understanding how your business objectives can be served by an investigation into the data.

  3. Know when to stop collecting data.

Data analysts perform pre-cleaning activities to complete these steps. Pre-cleaning activities help you determine and maintain data integrity.