You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _pages/syndiffix.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,7 +35,7 @@ While **SynDiffix** serves both use cases, it is especially good at descriptive
35
35
36
36
Obtaining this accuracy improvement, however, requires a different usage style compared to other products. The intended usage style of other products is "*one size fits all*": a single synthetic dataset serves all use cases. By contrast, with **SynDiffix**, a different *tailored* synthetic dataset should be produced for each use case.
{% include image.html src="/assets/img/usage.png" alt="SynDiffix usage style" max_width="450px" %}
39
39
40
40
For instance, suppose the analyst is interested in a heatmap with columns C and E. With other synthetic data products, one would synthesize the complete table, and then make the heatmap with only columns C and E. With **SynDiffix**, one would create a synthetic table consisting of only those two columns and obtain much better results.
41
41
@@ -45,14 +45,14 @@ For instance, suppose the analyst is interested in a heatmap with columns C and
45
45
46
46
Here are three scatter plots showing the synthetic and real points for a 2-column dataset for **SynDiffix**, the commercial product MostlyAI, and the open-source implementation of CTGAN by SDV.
{% include image.html src="/assets/img/scatter.png" alt="SynDiffix accuracy" max_width="600px" %}
49
49
50
50
The black dots are the original data and the blue dots are the synthetic overlaid on the original data. SynDiffix is far more accurate. More examples can be found in the [arXiv paper](https://arxiv.org/abs/2311.09628).
51
51
52
52
## Anonymity
53
53
54
54
We measured the anonymity of **SynDiffix** and a number of other products using the [Anonymeter](https://github.com/statice/anonymeter) tool developed by Statice. The following figure shows the results of measuring the effectiveness of 100s of attacks over multiple different tables (again, see the [arXiv paper](https://arxiv.org/abs/2311.09628) for details). Any score below 0.5 can be regarded as having strong anonymity, and scores below 0.2 are very strong. The "noAnon" plot measures attacks against the original data, and is there for calibration.
0 commit comments