API Reference
SmartNoise SQL
SmartNoise SQL is an extension of the SmartNoise framework, which enables privacy-preserving querying of sensitive databases using SQL. The AG's op_snsql
is based on the SmartNoise SQL library with all the methods being differentially private.
This page presents the steps and methods you will use to explore the op_snsql
library features.
Define Privacy Parameters
You need to provide a Privacy object when creating any private SQL connection. The defined privacy configuration applies to all queries executed against that connection. Below, you find an example of setting the Privacy.
class op_snsql.Privacy(epsilon:float=1.0, delta:float=10E-16)
The available parameters of Privacy
are the following:
epsilon
: The epsilon value for each statistic the private SQL connection returns.delta
: If you start working with differential privacy, you can avoid this parameter. Most counts and sums will use a delta of 0, but dimension censoring and the Gaussian mechanism require a delta. Set the Delta to , where is the approximate number of rows in the data source.
Loading Data
After defining the Privacy object, you can load data from the PrivateDataFrame
, using from_df
.
op_snsql.from_df(df,privacy,metadata=None) : _PrivateReader
The available parameters of from_df
are the following:
df: pandas.DataFrame | op_pandas.PrivateDataFrame
: The dataframe to be loaded.privacy: op_snsql.Privacy
: A Privacy object with epsilon, delta, and other privacy properties.metadata: Dict
: This is ignored when aPrivateDataframe
is sent. For a normal DataFrame, you need to send the metadata information with the format used bysnsql
.
The op_snsql.from_df
command returns a private reader class op_snsql._PrivateReader
that executes queries against the loaded data.
Executing Queries
After you load a DataSet, you can execute a query to receive differentially private recordsets.
# reader is an object of op_snsql._PrivateReader obtained when you load a dataset.
reader.execute(query_string, postprocess:bool=True)
The available parameters of reader.execute
are the following:
query_string
: A query string in SQL syntax.postprocess
: If False, the intermediate result will be returned immediately after adding noise and censoring dimensions. All post-processing that does not impact privacy, such as clamping negative counts, LIMIT, HAVING, and ORDER BY, will be skipped.
This method returns a record structured as an array of tuples, where each tuple represents a row, and each item in the tuple is typed. The first row will contain column names.
Estimating Privacy Budget
The get_privacy_cost
method estimates the Epsilon and Delta costs for running a given query. This method doesn’t require running a query or expending the Epsilon or Delta budget.
# reader is an object of op_snsql._PrivateReader obtained when you load a dataset.
reader.get_privacy_cost(query_string)
The available parameters of reader.get_privacy_cost
are the following:
query_string
: The SQL query string or a list of strings to analyse the expected cost.
The get_privacy_cost
method returns a tuple of (Epsilon and Delta) estimating the total privacy cost for running the informed queries.