Skip to content

Commit 5281b67

Browse files
committed
added figures and descriptions to syndiffix.md
1 parent e203574 commit 5281b67

File tree

4 files changed

+20
-2
lines changed

4 files changed

+20
-2
lines changed

_pages/syndiffix.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,24 @@ While **SynDiffix** serves both use cases, it is especially good at descriptive
3535

3636
Obtaining this accuracy improvement, however, requires a different usage style compared to other products. The intended usage style of other products is "*one size fits all*": a single synthetic dataset serves all use cases. By contrast, with **SynDiffix**, a different *tailored* synthetic dataset should be produced for each use case.
3737

38-
For instance, suppose the analyst is interested in the correlation between columns A and B. With other synthetic data products, one would synthesize the complete table, and then measure the correlation between columns A and B. With **SynDiffix**, one would create a synthetic table consisting of only those two columns and obtain much better results.
38+
<img src="/assets/img/usage.png" width="400" height="400">
3939

40-
**SynDiffix** has anonymization mechanisms that allow for literally thousands of column combinations without compromising anonymity. This is not the case with other products, where anonymity is weakened with each new data synthesis.
40+
For instance, suppose the analyst is interested in a heatmap with columns C and E. With other synthetic data products, one would synthesize the complete table, and then make the heatmap with only columns C and E. With **SynDiffix**, one would create a synthetic table consisting of only those two columns and obtain much better results.
41+
42+
**SynDiffix** has anonymization mechanisms that allow for literally thousands of column combinations without compromising anonymity. This is not the case with other products, where anonymity is weakened with each new data synthesis.
43+
44+
## Accuracy
45+
46+
Here are three scatter plots showing the synthetic and real points for a 2-column dataset for **SynDiffix**, the commercial product MostlyAI, and the open-source implementation of CTGAN by SDV.
47+
48+
<img src="/assets/img/scatter.png" width="550" height="200">
49+
50+
The black dots are the original data and the blue dots are the synthetic overlaid on the original data. SynDiffix is far more accurate. More examples can be found in the [arXiv paper](https://arxiv.org/abs/2311.09628).
51+
52+
## Anonymity
53+
54+
We measured the anonymity of **SynDiffix** and a number of other products using the [Anonymeter](https://github.com/statice/anonymeter) tool developed by Statice. The following figure shows the results of measuring the effectiveness of 100s of attacks over multiple different tables (again, see the [arXiv paper](https://arxiv.org/abs/2311.09628) for details). Any score below 0.5 can be regarded as having strong anonymity, and scores below 0.2 are very strong. The "noAnon" plot measures attacks against the original data, and is there for calibration.
55+
56+
<img src="/assets/img/privacy.png" width="400" height="200">
57+
58+
As these results show, **SynDiffix** and most of the other products have very strong anonymization.

assets/img/privacy.png

179 KB
Loading

assets/img/scatter.png

672 KB
Loading

assets/img/usage.png

193 KB
Loading

0 commit comments

Comments
 (0)