Guides
OpenDP
OpenDP is a powerful library for privacy-preserving data analysis. It provides a wide range of functions and methods to ensure the privacy of sensitive data while enabling a meaningful analysis.
This page will guide you on how to get started with op_opendp and make the most of its features.
To use the op_opendp
, you need to import the library as presented in the following code block:
%%ag
import op_opendp
API Reference
Currently, AG's op_opendp
supports all the functionality of OpenDP, maintaining the same module names and function signatures.
Creating Pipelines
OpenDP offers numerous measurements and transformations that can be used to construct differentially private pipelines, providing a wide range of options. The code block below presents an example of calculating the count of several elements using op_opendp
:
%%ag
from op_opendp.transformations import make_count
from op_opendp.measurements import make_base_discrete_laplace
pipeline_1 = make_count(TIA=float) # Non differentially private unit of measurement
eps = 0.5
pipeline_2 = (make_count(TIA=float) >> make_base_discrete_laplace(1./eps)) # Differentially private unit of measurement
Executing the Pipelines
The op_opendp
library includes several measures to ensure that only differentially private pipelines are executed. As a result, op_opendp
prevents the execution of non-differentially private pipelines like pipeline_1
, presented in the code block example above. Therefore, to run the pipelines, they must be processed through the run_pipeline
method provided by op_opendp
, as presented in the code block below.
op_opendp.run_pipeline(
pipeline: op_opendp.Measurement,
data: PrivateSeries | PrivateDataFrame | List,
target_delta: float,
d_in_type int | float
)
In the above example, you find:
pipeline
: a differentially private pipeline (it should be of the typeop_opendp.Measurement
).data
: data to run the pipeline on.target_delta
: used for ZCDP conversion to budgeting.d_in_type
: used to cast sensitivity to the selected type (int
orfloat
).
Budgeting Pipelines
Additionally, if you are interested on knowing the budget of a pipeline, you can use budget_usage
using the sensitivity of the data.
op_opendp.budget_usage(
pipeline: op_opendp.Measurement,
data: PrivateSeries | PrivateDataFrame | List,
target_delta: float,
d_in_type int | float
)
Converting OpenDP Pipelines to OP_OpenDP
On most scenarios, you may want to convert an existing OpenDP pipeline or example to the private version (OP_OpenDP) for AG.
Let's start by taking a chained transformation example from the OpenDP documentation.
# OpenDP Code
from opendp.trans import make_bounded_sum
from opendp.meas import make_base_geometric
from opendp.comb import make_chain_mt
# call a constructor to produce a transformation
bounded_sum = make_bounded_sum(bounds=(0, 1))
# call a constructor to produce a measurement
base_geometric = make_base_geometric(scale=1.0)
noisy_sum = make_chain_mt(base_geometric, bounded_sum)
# invoke the chained measurement's function
dataset = [0, 0, 1, 1, 0, 1, 1, 1]
release = noisy_sum(dataset)
print(release)
To convert this pipeline to the private version, we would first start by changing the import names.
%%ag
from op_opendp.transformations import make_bounded_sum
from op_opendp.measurements import make_base_geometric
from op_opendp.combinators import make_chain_mt
We will then need to load the op_opendp.run_pipeline
function to run our defined pipeline.
%%ag
from op_opendp import run_pipeline
Everything else in the code can be written exactly as it was defined in OpenDP.
%%ag
bounded_sum = make_bounded_sum(bounds=(0, 1))
base_geometric = make_base_geometric(scale=1.0)
noisy_sum = make_chain_mt(base_geometric, bounded_sum)
Then, to actually execute the pipeline, we use run_piepline
using our own data.
%%ag
release = run_pipeline(noisy_sum, YOUR_DATASET, YOUR_TARGET_DELTA, YOUR_D_IN_TYPE)
Finally, to review the result we use ag_print
.
%%ag
ag_print(release)
Aggregating the previous steps, yields the following converted code:
%%ag
from op_opendp.transformations import make_bounded_sum
from op_opendp.measurements import make_base_geometric
from op_opendp.combinators import make_chain_mt
from op_opendp import run_pipeline
bounded_sum = make_bounded_sum(bounds=(0, 1))
base_geometric = make_base_geometric(scale=1.0)
noisy_sum = make_chain_mt(base_geometric, bounded_sum)
release = run_pipeline(noisy_sum, YOUR_DATASET, YOUR_TARGET_DELTA, YOUR_D_IN_TYPE)
ag_print(release)
Resources
For detailed information about OpenDP's function signatures and methods, refer to the official OpenDP documentation. The documentation provides comprehensive guidance on how to use each library feature effectively.