Synthetic data generation (SDG) is the process of generating a new synthetic dataset based on the statistical properties of a confidential existing dataset. Differential privacy is the property of a SDG mechanism that establishes how protected individuals whose sensitive data is part of the confidential dataset are, when sharing such data. To ensure a SDG is differentially private, noise is injected into the statistics learned from the dataset. Depending on the amount of noise injected, we witness a trade-off between privacy and utility. Privacy is then measured via a set of privacy metrics that usually establish a lower bound on a few aspects of the privacy-utility tradeoff. Therefore, it is not possible to assess privacy based only on one metric. To close this gap, we demonstrate PrivEval, a tool to assist users in evaluating the privacy properties of a synthetic dataset. PrivEval implements several privacy metrics and validates them on both a single user and the overall dataset. Besides, PrivEval checks assumptions behind each metric. Hence, PrivEval is a first step to bridge the gap between privacy experts and the general public to make privacy estimation more transparent.

PrivEval: A Tool for Interactive Evaluation of Privacy Metrics in Synthetic Data Generation

Lissandrini, Matteo;
2025-01-01

Abstract

Synthetic data generation (SDG) is the process of generating a new synthetic dataset based on the statistical properties of a confidential existing dataset. Differential privacy is the property of a SDG mechanism that establishes how protected individuals whose sensitive data is part of the confidential dataset are, when sharing such data. To ensure a SDG is differentially private, noise is injected into the statistics learned from the dataset. Depending on the amount of noise injected, we witness a trade-off between privacy and utility. Privacy is then measured via a set of privacy metrics that usually establish a lower bound on a few aspects of the privacy-utility tradeoff. Therefore, it is not possible to assess privacy based only on one metric. To close this gap, we demonstrate PrivEval, a tool to assist users in evaluating the privacy properties of a synthetic dataset. PrivEval implements several privacy metrics and validates them on both a single user and the overall dataset. Besides, PrivEval checks assumptions behind each metric. Hence, PrivEval is a first step to bridge the gap between privacy experts and the general public to make privacy estimation more transparent.
2025
privacy evaluation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1181053
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact