Core Concepts
Privacy Budget
Epsilon () and delta () are the base parameters that quantify and manage privacy protection in differential privacy. All Competitions have a Privacy Budget. Each query made spends Epsilon from the Privacy Budget. The sum of the values spent on each query is the total Epsilon spent, which must be within the Privacy Budget range.
Spending Privacy Budget for Your Needs
When spending the Privacy Budget to allocate Epsilon () and Delta () to teams and members, consider the following:
- Understand the parameters: It is essential to understand how Epsilon and Delta affect the utility or accuracy of the data analysis.
- Consider the context of the data usage: The nature of the data and its intended use are crucial in determining and . Highly sensitive data like health records may require a smaller for stronger privacy.
- Define a desired level of privacy: Determine the acceptable level of privacy risk. In scenarios where individual privacy is critical, opting for a smaller is advisable.
- Understand the data analysis goals: Consider the required accuracy and specificity of the data analysis results. For broader, less granular insights, a smaller ε may suffice.
Selecting suitable Epsilon () and delta () values is a critical decision that impacts the balance between data utility and privacy. Depending on the scenario, consider the following factors:
- Data sensitivity: Highly sensitive data needs lower values to ensure data is more strictly protected.
- Population size: Larger datasets typically will have better signal-to-noise ratios, mitigating the effect of the noise on the insights created.
- Data access frequency: Frequent access to data for analysis might require more restrictive settings to maintain privacy over time, especially if data is reused in a new analysis, and thus the Privacy Budgets can compound.
- Collaborative environments: When multiple parties are involved, consider the cumulative privacy risk and adjust and accordingly.
- Caching and budgeting: Often, similar questions are asked, which can be used to answer one another, such as taking the sum, count, and later the mean. The net privacy loss can be minimised by caching and reusing the queries and responses across multiple queries.