Skip to content

Commit 5a08e0b

Browse files
committed
Deploying to gh-pages from @ a28da8a 🚀
1 parent 841dafe commit 5a08e0b

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

feed.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="https://diffix.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://diffix.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2023-11-22T13:22:55+00:00</updated><id>https://diffix.github.io/feed.xml</id><title type="html">blank</title><subtitle>Strong Anonymization for Structured Data. Open. Free. </subtitle></feed>
1+
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="https://diffix.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://diffix.github.io/" rel="alternate" type="text/html" hreflang="en"/><updated>2023-11-22T13:31:10+00:00</updated><id>https://diffix.github.io/feed.xml</id><title type="html">blank</title><subtitle>Strong Anonymization for Structured Data. Open. Free. </subtitle></feed>

syndiffix.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
<!DOCTYPE html> <html lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title>SynDiffix | Open Diffix</title> <meta name="author" content="Open Diffix"> <meta name="description" content="Strong Anonymization for Structured Data. Open. Free. "> <meta name="keywords" content="Diffix, Anonymization, Privacy, Synthetic Data"> <link rel="stylesheet" href="/assets/css/bootstrap.min.css?a4b3f509e79c54a512b890d73235ef04"> <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/mdbootstrap@4.20.0/css/mdb.min.css" integrity="sha256-jpjYvU3G3N6nrrBwXJoVEYI/0zw8htfFnhT9ljN3JJw=" crossorigin="anonymous"> <link defer rel="stylesheet" href="https://unpkg.com/bootstrap-table@1.22.1/dist/bootstrap-table.min.css"> <link rel="stylesheet" href="/assets/css/academicons.min.css?f0b7046b84e425c55f3463ac249818f5"> <link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Roboto:300,400,500,700|Roboto+Slab:100,300,400,500,700|Material+Icons"> <link rel="stylesheet" href="/assets/css/jekyll-pygments-themes-github.css?19f3075a2d19613090fe9e16b564e1fe" media="" id="highlight_theme_light"> <link rel="shortcut icon" href="data:image/svg+xml,&lt;svg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20100%20100%22&gt;&lt;text%20y=%22.9em%22%20font-size=%2290%22&gt;%E2%9A%9B%EF%B8%8F&lt;/text&gt;&lt;/svg&gt;"> <link rel="stylesheet" href="/assets/css/main.css?d41d8cd98f00b204e9800998ecf8427e"> <link rel="canonical" href="https://diffix.github.io/syndiffix"> </head> <body class="fixed-top-nav sticky-bottom-footer"> <header> <nav id="navbar" class="navbar navbar-light navbar-expand-sm fixed-top"> <div class="container"> <a class="navbar-brand title font-weight-lighter" href="/"><span class="font-weight-bold">Open Diffix </span></a> <button class="navbar-toggler collapsed ml-auto" type="button" data-toggle="collapse" data-target="#navbarNav" aria-controls="navbarNav" aria-expanded="false" aria-label="Toggle navigation"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar top-bar"></span> <span class="icon-bar middle-bar"></span> <span class="icon-bar bottom-bar"></span> </button> <div class="collapse navbar-collapse text-right" id="navbarNav"> <ul class="navbar-nav ml-auto flex-nowrap"> <li class="nav-item "> <a class="nav-link" href="/">Home</a> </li> <li class="nav-item active"> <a class="nav-link" href="/syndiffix">SynDiffix<span class="sr-only">(current)</span></a> </li> </ul> </div> </div> </nav> <progress id="progress" value="0"> <div class="progress-container"> <span class="progress-bar"></span> </div> </progress> </header> <div class="container mt-5"> <div class="post"> <header class="post-header"> <h1 class="post-title">SynDiffix</h1> <p class="post-description"></p> </header> <article> <p><strong>SynDiffix</strong> is an open-source Python package for generating statistically-accurate and strongly anonymous synthetic data from structured data. Compared to existing open-source and proprietary commercial approaches, <strong>SynDiffix</strong> is</p> <ul> <li>many times more accurate,</li> <li>has comparable or better ML efficacy,</li> <li>runs as fast or faster,</li> <li>has stronger anonymization.</li> </ul> <p>A complete description of <strong>SynDiffix</strong>, including its operation, performance, and anonymity, can be found on <a href="https://arxiv.org/abs/2311.09628" rel="external nofollow noopener" target="_blank">arXiv</a>. See <a href="https://github.com/diffix/syndiffix" rel="external nofollow noopener" target="_blank">github.com/diffix/syndiffix</a>.</p> <p>Programming with <strong>SynDiffix</strong> can be as easy as:</p> <div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">syndiffix</span> <span class="kn">import</span> <span class="n">Synthesizer</span>
22

33
<span class="n">df_synthetic</span> <span class="o">=</span> <span class="nc">Synthesizer</span><span class="p">(</span><span class="n">df_original</span><span class="p">).</span><span class="nf">sample</span><span class="p">()</span>
4-
</code></pre></div></div> <h2 id="use-cases">Use Cases</h2> <p>Synthetic data has two primary use cases:</p> <ol> <li>Descriptive analytics (histograms, heatmaps, column correlations, basic statistics like counting, averages, standard deviations, and so on).</li> <li>Machine learning (building models, extending datasets, etc.)</li> </ol> <p>While <strong>SynDiffix</strong> serves both use cases, it is especially good at descriptive analytics. The quality of descriptive analytics is many times that of other synthetic data products.</p> <h2 id="usage">Usage</h2> <p>Obtaining this accuracy improvement, however, requires a different usage style compared to other products. The intended usage style of other products is “<em>one size fits all</em>”: a single synthetic dataset serves all use cases. By contrast, with <strong>SynDiffix</strong>, a different <em>tailored</em> synthetic dataset should be produced for each use case.</p> <p><img src="/assets/img/usage.png" alt="SynDiffix usage style" class="img-fluid" style="max-width: 450px; width: 100%;"></p> <p>For instance, suppose the analyst is interested in a heatmap with columns C and E. With other synthetic data products, one would synthesize the complete table, and then make the heatmap with only columns C and E. With <strong>SynDiffix</strong>, one would create a synthetic table consisting of only those two columns and obtain much better results.</p> <p><strong>SynDiffix</strong> has anonymization mechanisms that allow for literally thousands of column combinations without compromising anonymity. This is not the case with other products, where anonymity is weakened with each new data synthesis.</p> <h2 id="accuracy">Accuracy</h2> <p>Here are three scatter plots showing the synthetic and real points for a 2-column dataset for <strong>SynDiffix</strong>, the commercial product MostlyAI, and the open-source implementation of CTGAN by SDV.</p> <p><img src="/assets/img/scatter.png" alt="SynDiffix accuracy" class="img-fluid" style="max-width: 600px; width: 100%;"></p> <p>The black dots are the original data and the blue dots are the synthetic overlaid on the original data. SynDiffix is far more accurate. More examples can be found in the <a href="https://arxiv.org/abs/2311.09628" rel="external nofollow noopener" target="_blank">arXiv paper</a>.</p> <h2 id="anonymity">Anonymity</h2> <p>We measured the anonymity of <strong>SynDiffix</strong> and a number of other products using the <a href="https://github.com/statice/anonymeter" rel="external nofollow noopener" target="_blank">Anonymeter</a> tool developed by Statice. The following figure shows the results of measuring the effectiveness of 100s of attacks over multiple different tables (again, see the <a href="https://arxiv.org/abs/2311.09628" rel="external nofollow noopener" target="_blank">arXiv paper</a> for details). Any score below 0.5 can be regarded as having strong anonymity, and scores below 0.2 are very strong. The “noAnon” plot measures attacks against the original data, and is there for calibration.</p> <p><img src="/assets/img/privacy.png" alt="SynDiffix privacy" class="img-fluid" style="max-width: 400px; width: 100%;"></p> <p>As these results show, <strong>SynDiffix</strong> and most of the other products have very strong anonymization with respect to the attack used by Anonymeter.</p> </article> </div> </div> <footer class="sticky-bottom mt-5"> <div class="container"> © Copyright 2023 Open Diffix. Powered by <a href="https://jekyllrb.com/" target="_blank" rel="external nofollow noopener">Jekyll</a> with <a href="https://github.com/alshedivat/al-folio" rel="external nofollow noopener" target="_blank">al-folio</a> theme. Hosted by <a href="https://pages.github.com/" target="_blank" rel="external nofollow noopener">GitHub Pages</a>. </div> </footer> <script src="https://cdn.jsdelivr.net/npm/jquery@3.6.0/dist/jquery.min.js" integrity="sha256-/xUj+3OJU5yExlq6GSYGSHk7tPXikynS7ogEvDej/m4=" crossorigin="anonymous"></script> <script src="/assets/js/bootstrap.bundle.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/mdbootstrap@4.20.0/js/mdb.min.js" integrity="sha256-NdbiivsvWt7VYCt6hYNT3h/th9vSTL4EDWeGs5SN3DA=" crossorigin="anonymous"></script> <script defer src="https://cdn.jsdelivr.net/npm/medium-zoom@1.0.8/dist/medium-zoom.min.js" integrity="sha256-7PhEpEWEW0XXQ0k6kQrPKwuoIomz8R8IYyuU1Qew4P8=" crossorigin="anonymous"></script> <script defer src="/assets/js/zoom.js?7b30caa5023af4af8408a472dc4e1ebb"></script> <script defer src="https://unpkg.com/bootstrap-table@1.22.1/dist/bootstrap-table.min.js"></script> <script src="/assets/js/no_defer.js?d633890033921b33e0ceb13d22340a9c"></script> <script defer src="/assets/js/common.js?acdb9690d7641b2f8d40529018c71a01"></script> <script defer src="/assets/js/copy_code.js?c9d9dd48933de3831b3ee5ec9c209cac" type="text/javascript"></script> <script type="text/javascript">window.MathJax={tex:{tags:"ams"}};</script> <script defer type="text/javascript" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3.2.0/es5/tex-mml-chtml.js"></script> <script defer src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script type="text/javascript">function progressBarSetup(){"max"in document.createElement("progress")?(initializeProgressElement(),$(document).on("scroll",function(){progressBar.attr({value:getCurrentScrollPosition()})}),$(window).on("resize",initializeProgressElement)):(resizeProgressBar(),$(document).on("scroll",resizeProgressBar),$(window).on("resize",resizeProgressBar))}function getCurrentScrollPosition(){return $(window).scrollTop()}function initializeProgressElement(){let e=$("#navbar").outerHeight(!0);$("body").css({"padding-top":e}),$("progress-container").css({"padding-top":e}),progressBar.css({top:e}),progressBar.attr({max:getDistanceToScroll(),value:getCurrentScrollPosition()})}function getDistanceToScroll(){return $(document).height()-$(window).height()}function resizeProgressBar(){progressBar.css({width:getWidthPercentage()+"%"})}function getWidthPercentage(){return getCurrentScrollPosition()/getDistanceToScroll()*100}const progressBar=$("#progress");window.onload=function(){setTimeout(progressBarSetup,50)};</script> </body> </html>
4+
</code></pre></div></div> <h2 id="use-cases">Use Cases</h2> <p>Synthetic data has two primary use cases:</p> <ol> <li>Descriptive analytics (histograms, heatmaps, column correlations, basic statistics like counting, averages, standard deviations, and so on).</li> <li>Machine learning (building models, extending datasets, etc.)</li> </ol> <p>While <strong>SynDiffix</strong> serves both use cases, it is especially good at descriptive analytics. The quality of descriptive analytics is many times that of other synthetic data products.</p> <h2 id="usage">Usage</h2> <p>Obtaining this accuracy improvement, however, requires a different usage style compared to other products. The intended usage style of other products is “<em>one size fits all</em>”: a single synthetic dataset serves all use cases. By contrast, with <strong>SynDiffix</strong>, a different <em>tailored</em> synthetic dataset should be produced for each use case.</p> <p><img src="/assets/img/usage.png" alt="SynDiffix usage style" class="img-fluid" style="max-width: 550px; width: 100%;"></p> <p>For instance, suppose the analyst is interested in a heatmap with columns C and E. With other synthetic data products, one would synthesize the complete table, and then make the heatmap with only columns C and E. With <strong>SynDiffix</strong>, one would create a synthetic table consisting of only those two columns and obtain much better results.</p> <p><strong>SynDiffix</strong> has anonymization mechanisms that allow for literally thousands of column combinations without compromising anonymity. This is not the case with other products, where anonymity is weakened with each new data synthesis.</p> <h2 id="accuracy">Accuracy</h2> <p>Here are three scatter plots showing the synthetic and real points for a 2-column dataset for <strong>SynDiffix</strong>, the commercial product MostlyAI, and the open-source implementation of CTGAN by SDV.</p> <p><img src="/assets/img/scatter.png" alt="SynDiffix accuracy" class="img-fluid" style="max-width: 600px; width: 100%;"></p> <p>The black dots are the original data and the blue dots are the synthetic overlaid on the original data. SynDiffix is far more accurate. More examples can be found in the <a href="https://arxiv.org/abs/2311.09628" rel="external nofollow noopener" target="_blank">arXiv paper</a>.</p> <h2 id="anonymity">Anonymity</h2> <p>We measured the anonymity of <strong>SynDiffix</strong> and a number of other products using the <a href="https://github.com/statice/anonymeter" rel="external nofollow noopener" target="_blank">Anonymeter</a> tool developed by Statice. The following figure shows the results of measuring the effectiveness of 100s of attacks over multiple different tables (again, see the <a href="https://arxiv.org/abs/2311.09628" rel="external nofollow noopener" target="_blank">arXiv paper</a> for details). Any score below 0.5 can be regarded as having strong anonymity, and scores below 0.2 are very strong. The “noAnon” plot measures attacks against the original data, and is there for calibration.</p> <p><img src="/assets/img/privacy.png" alt="SynDiffix privacy" class="img-fluid" style="max-width: 400px; width: 100%;"></p> <p>As these results show, <strong>SynDiffix</strong> and most of the other products have very strong anonymization with respect to the attack used by Anonymeter.</p> </article> </div> </div> <footer class="sticky-bottom mt-5"> <div class="container"> © Copyright 2023 Open Diffix. Powered by <a href="https://jekyllrb.com/" target="_blank" rel="external nofollow noopener">Jekyll</a> with <a href="https://github.com/alshedivat/al-folio" rel="external nofollow noopener" target="_blank">al-folio</a> theme. Hosted by <a href="https://pages.github.com/" target="_blank" rel="external nofollow noopener">GitHub Pages</a>. </div> </footer> <script src="https://cdn.jsdelivr.net/npm/jquery@3.6.0/dist/jquery.min.js" integrity="sha256-/xUj+3OJU5yExlq6GSYGSHk7tPXikynS7ogEvDej/m4=" crossorigin="anonymous"></script> <script src="/assets/js/bootstrap.bundle.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/mdbootstrap@4.20.0/js/mdb.min.js" integrity="sha256-NdbiivsvWt7VYCt6hYNT3h/th9vSTL4EDWeGs5SN3DA=" crossorigin="anonymous"></script> <script defer src="https://cdn.jsdelivr.net/npm/medium-zoom@1.0.8/dist/medium-zoom.min.js" integrity="sha256-7PhEpEWEW0XXQ0k6kQrPKwuoIomz8R8IYyuU1Qew4P8=" crossorigin="anonymous"></script> <script defer src="/assets/js/zoom.js?7b30caa5023af4af8408a472dc4e1ebb"></script> <script defer src="https://unpkg.com/bootstrap-table@1.22.1/dist/bootstrap-table.min.js"></script> <script src="/assets/js/no_defer.js?d633890033921b33e0ceb13d22340a9c"></script> <script defer src="/assets/js/common.js?acdb9690d7641b2f8d40529018c71a01"></script> <script defer src="/assets/js/copy_code.js?c9d9dd48933de3831b3ee5ec9c209cac" type="text/javascript"></script> <script type="text/javascript">window.MathJax={tex:{tags:"ams"}};</script> <script defer type="text/javascript" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3.2.0/es5/tex-mml-chtml.js"></script> <script defer src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script type="text/javascript">function progressBarSetup(){"max"in document.createElement("progress")?(initializeProgressElement(),$(document).on("scroll",function(){progressBar.attr({value:getCurrentScrollPosition()})}),$(window).on("resize",initializeProgressElement)):(resizeProgressBar(),$(document).on("scroll",resizeProgressBar),$(window).on("resize",resizeProgressBar))}function getCurrentScrollPosition(){return $(window).scrollTop()}function initializeProgressElement(){let e=$("#navbar").outerHeight(!0);$("body").css({"padding-top":e}),$("progress-container").css({"padding-top":e}),progressBar.css({top:e}),progressBar.attr({max:getDistanceToScroll(),value:getCurrentScrollPosition()})}function getDistanceToScroll(){return $(document).height()-$(window).height()}function resizeProgressBar(){progressBar.css({width:getWidthPercentage()+"%"})}function getWidthPercentage(){return getCurrentScrollPosition()/getDistanceToScroll()*100}const progressBar=$("#progress");window.onload=function(){setTimeout(progressBarSetup,50)};</script> </body> </html>

0 commit comments

Comments
 (0)