Open Conference Systems, STATISTICS AND DATA SCIENCE: NEW CHALLENGES, NEW GENERATIONS

Font Size: 
Relative privacy risks and learning from anonymized data
Michele Boreale, Fabio Corradi

Last modified: 2017-05-22

Abstract


We consider group-based anonymized tables, a popular
approach to  data publishing. This approach aims at protecting
privacy of the involved individuals, by releasing an
\emph{obfuscated} version of the original data, where the exact
correspondence between individuals and attribute values is hidden.
When publishing data about individuals, one must typically balance
the \emph{learner}'s utility against the risk posed by an
\emph{attacker}, potentially targeting individuals in the dataset.
Accordingly, we propose a \textsc{mcmc} based methodology by which a
data curator can simultaneously:  (a) learn the population
parameters from a given anonymized table, thus assessing its
utility; (b) analyze the risk for any individual in the dataset to
be linked to a specific sensitive value, beyond what can be
inferred from the population parameters learned in (a), when the
attacker has got to know the individual's nonsensitive attributes.
%, when the {attacker} has
%got knowledge of his nonsensitive attributes.
We call this \emph{relative risk} analysis. We propose a unified
probabilistic model that encompasses both \emph{horizontal} group
based anonymization schemes, such as k-anonymity, and
\emph{vertical} ones, such as Anatomy. We detail the learning
procedure for both the honest learner and the attacker. Based on the
learned distributions, we put forward relative risk \emph{measures}.
Finally, we illustrate some experiments conducted with the proposed
methodology on a real world dataset.