The strategic value of personal data for companies and organisations is obvious. However, the risk that the massive processing of personal data poses to the rights and freedoms of individuals and to our model of society is equally undeniable. For this reason, it is necessary to adopt the necessary guarantees so that the processing carried out by the different data controllers does not entail an interference with the privacy of individuals. In the search for a balance between the legitimate exploitation of information and the respect for individual rights, strategies aimed at preserving the usefulness of data while respecting their privacy are emerging. One of these strategies is differential privacy.
The US Census Bureau, in order to ensure the accuracy of its statistics, prevent personal information from being revealed through its statistics, and thus increase citizens’ trust in the security of the data they provide, is applying differential privacy.
Differential privacy can be categorised as one of the privacy enhancing techniques (PETs) aimed at establishing data protection guarantees by design through the practical implementation of information abstraction strategies. As described by its creator, Cynthia Dwork, differential privacy guarantees, despite the incorporation of random noise into the original information, that the result of the analysis process of the data to which this technique has been applied does not suffer losses in the usefulness of the results obtained. It is based on the Law of Large Numbers, a statistical principle that states that when the sample size grows, the average values derived from it approach the real mean value of the information. Thus, the addition of random noise to all the data compensates for these effects and produces an “essentially equivalent” value.
The concept “essentially equivalent” does not mean that the result obtained is identical, but refers to the fact that the actual result from the analysis derived from the original data set and the result from the set to which differential privacy has been applied are functionally equivalent. This circumstance allows for the “plausible deniability” of a particular subject’s data being in the dataset under analysis. For this purpose, the noise pattern embedded in the data has to be adapted to the processing and the accuracy margins that need to be obtained.
At first sight, the behaviour described above allows two important conclusions to be drawn:
- This strategy seeks to protect the results of the analysis of the information, which is what is to be disseminated. Therefore, it does not alter the original data, but acts on the transformation process or algorithm for querying and publishing the analysed data.
- As a consequence, and unlike other privacy assurance techniques, it does not require a detailed analysis of other possible data sources that could be used to establish relations with the input data or of the possible attack models used. With this technique, the focus of the information privacy enhancement strategy is on the data analysis process employed and not on the characteristics of the data as such.
1. The data analyst launches the query to the software that implements differential privacy (SPD).
2. The SPD assesses the privacy impact of this query on the data.
3. The SPD redirects the query to the database and obtains the actual and complete response to the query.
4. The SPD adds the amount of noise necessary to distort the query result according to the privacy budget
and sends the modified response back to the data analyst.
Image taken from Microsoft’s note “Differential Privacy for Everyone”
More specifically, differential privacy relies on…