De-identifying data

What de-identification is, and how to de-identify datasets.

What is de-identification?

De-identification removes information to allow data to be used without the possibility of individuals being identified.

Data de-identification may be used to protect the privacy of individuals and organisations or, for example, to ensure that the spatial locations of minerals, archaeological findings, or endangered species are not publicly available.

Degrees of identification in data

The definitions and examples below from data.govt.nz explain the difference between identifiable, de-identified and confidentialised information.

Identifiable

Data that directly or indirectly identifies an individual or business.

Examples

Individual:

Name: Hēni
Gender: Female
Date of birth: 31/01/1985
Address: 28 My Road, Postcode 6012, Wellington

Business:

Name: Puzzles
Type: Paper stationery manufacturing
Employees: 34
Expenditure: $398,000

De-identified

Data which has had information removed from it to reduce the risk of spontaneous recognition.

Examples

Individual:

Name: Unknown
Gender: Female
Date of birth: 1985
Address: Postcode 6012, Wellington

Business:

Name: Unknown
Type: Manufacturing
Employees: 30-40
Expenditure: $398,000

Confidentialised

Data that has had statistical methods applied to it to protect against disclosing unauthorised information.

Examples

Individual:

Name: Unknown
Gender: Female
Age: 30-40 years
Address: Wellington

Business:

Name: Unknown
Type: Manufacturing
Employees: 10-100
Expenditure: Under $500,000

Resources

For practical guidance for de-identification, dealing with different types of data (e.g. qualitative, audio-visual), and management of identifiable data, use the external resources below:

Contact

Research Data Support Services
Email: researchdata@auckland.ac.nz

eResearch Engagement Lead
Email: Laura Armstrong