Guides

OpenDP

OpenDP is a powerful library for privacy-preserving data analysis. It provides a wide range of functions and methods to ensure the privacy of sensitive data while enabling a meaningful analysis.

This page will guide you on how to get started with op_opendp and make the most of its features.

To use the op_opendp, you need to import the library as presented in the following code block:

%%ag
import op_opendp

API Reference

Currently, AG's op_opendp supports all the functionality of OpenDP 0.8.0, maintaining the same module names and function signatures.

Creating Pipelines

OpenDP offers numerous measurements and transformations that can be used to construct differentially private pipelines, providing a wide range of options. The code block below presents an example of calculating the count of several elements using op_opendp :

%%ag
from op_opendp.transformations import make_count, make_sum
from op_opendp.measurements import make_base_discrete_laplace
from op_opendp.domains import atom_domain, vector_domain
from op_opendp.metrics import symmetric_distance, absolute_distance
from op_opendp.mod import enable_features
enable_features("contrib")
input_domain = vector_domain(atom_domain(T=float))
input_metric = symmetric_distance()

pipeline_1 = make_count(input_domain, input_metric, TO=int) # Non differentially private unit of measurement
eps = 0.5
pipeline_2 = (make_count(input_domain, input_metric, TO=int) >> make_base_discrete_laplace(atom_domain(T=int), absolute_distance(T=int), 1./eps)) # Differentially private unit of measurement

Executing the Pipelines

The op_opendp library includes several measures to ensure that only differentially private pipelines are executed. As a result, op_opendp prevents the execution of non-differentially private pipelines like pipeline_1, presented in the code block example above. Therefore, to run the pipelines, they must be processed through the run_pipeline method provided by op_opendp , as presented in the code block below.

op_opendp.run_pipeline(
  pipeline: op_opendp.Measurement, 
  data: PrivateSeries | PrivateDataFrame | List, 
  target_delta: float, 
  d_in_type int | float
)

In the above example, you find:

pipeline: a differentially private pipeline (it should be of the type op_opendp.Measurement).
data: data to run the pipeline on.
target_delta: used for ZCDP conversion to $\epsilon, \delta$ budgeting.
d_in_type: used to cast sensitivity to the selected type (int or float).

Budgeting Pipelines

Additionally, if you are interested on knowing the budget of a pipeline, you can use budget_usage using the sensitivity of the data.

op_opendp.budget_usage(
  pipeline: op_opendp.Measurement, 
  data: PrivateSeries | PrivateDataFrame | List, 
  target_delta: float, 
  d_in_type int | float
)```

## Converting OpenDP Pipelines to OP_OpenDP

On most scenarios, you may want to convert an existing OpenDP pipeline or example to the private version (OP_OpenDP) for AG. 

Let's start by taking a chained transformation example from the [OpenDP documentation](https://docs.opendp.org/en/v0.3.0/user/combinator-constructors.html#combinator-constructors).

```python
# OpenDP Code
from opendp.transformations import make_sum
from opendp.measurements import make_base_geometric
from opendp.combinators import make_chain_mt
from opendp.mod import enable_features
from opendp.domains import vector_domain, atom_domain
from opendp.metrics import symmetric_distance, absolute_distance
enable_features("contrib")
input_domain = vector_domain(atom_domain(T=int, bounds=(0, 1)))
input_metric = symmetric_distance()
# call a constructor to produce a transformation
bounded_sum = make_sum(input_domain,input_metric)

# call a constructor to produce a measurement
base_geometric = make_base_geometric(atom_domain(T=int), absolute_distance(T=int),scale=1.0)
noisy_sum = make_chain_mt(base_geometric, bounded_sum)

# invoke the chained measurement's function
dataset = [0, 0, 1, 1, 0, 1, 1, 1]
release = noisy_sum(dataset)

print(release)

To convert this pipeline to the private version, we would first start by changing the import names.

%%ag
from op_opendp.transformations import make_count, make_sum
from op_opendp.metrics import symmetric_distance, absolute_distance
from op_opendp.measurements make_base_geometric
from op_opendp.combinators import make_chain_mt

We will then need to load the op_opendp.run_pipeline function to run our defined pipeline.

%%ag
from op_opendp import run_pipeline
enable_features("contrib")  # Some opendp features are part of contrib and needs to be enabled

Everything else in the code can be written exactly as it was defined in OpenDP.

%%ag
bounded_sum = make_sum(input_domain=vector_domain(atom_domain(T=int, bounds=(0,1))), input_metric= symmetric_distance())

base_geometric = make_base_geometric(atom_domain(T=int), absolute_distance(T=int),scale=1.0)
noisy_sum = make_chain_mt(base_geometric, bounded_sum)

Then, to actually execute the pipeline, we use run_piepline using our own data.

%%ag
release = run_pipeline(noisy_sum, YOUR_DATASET, YOUR_TARGET_DELTA, YOUR_D_IN_TYPE)

Finally, to review the result we use ag_print.

%%ag
ag_print(release)

Aggregating the previous steps, yields the following converted code:

%%ag
from op_opendp.transformations import make_sum
from op_opendp.measurements import make_base_geometric
from op_opendp.domains import atom_domain, vector_domain
from op_opendp.metrics import symmetric_distance, absolute_distance
from op_opendp.combinators import make_chain_mt
from op_opendp.mod import enable_features
from op_opendp import run_pipeline
enable_features("contrib")
input_domain = vector_domain(atom_domain(T=float))
bounded_sum = make_sum(input_domain=vector_domain(atom_domain(T=int, bounds=(0,1))), input_metric= symmetric_distance())

base_geometric = make_base_geometric(atom_domain(T=int), absolute_distance(T=int),scale=1.0)
noisy_sum = make_chain_mt(base_geometric, bounded_sum)

import pandas as pd
from op_pandas import PrivateSeries

data = PrivateSeries(pd.Series([0, 0, 1, 1, 0, 1, 1, 1]))

ag_print(run_pipeline(noisy_sum,data, None, None))

Resources

For detailed information about OpenDP's function signatures and methods, refer to the official OpenDP documentation. The documentation provides comprehensive guidance on how to use each library feature effectively.

Guides

OpenDP​

API Reference​

Creating Pipelines​

Executing the Pipelines​

Budgeting Pipelines​