Skip to main content

API Reference

SmartNoise SQL

SmartNoise SQL is an extension of the SmartNoise framework, which enables privacy-preserving querying of sensitive databases using SQL. The AG's op_snsql is based on the SmartNoise SQL library with all the methods being differentially private.

This page presents the steps and methods you will use to explore the op_snsql library features.

Define Privacy Parameters

You need to provide a Privacy object when creating any private SQL connection. The defined privacy configuration applies to all queries executed against that connection. Below, you find an example of setting the Privacy.

class op_snsql.Privacy(epsilon:float=1.0, delta:float=10E-16)

The available parameters of Privacy are the following:

  • epsilon: The epsilon value for each statistic the private SQL connection returns.
  • delta: If you start working with differential privacy, you can avoid this parameter. Most counts and sums will use a delta of 0, but dimension censoring and the Gaussian mechanism require a delta. Set the Delta to 1/(nsqrt(n))1/(n∗sqrt(n)), where nn is the approximate number of rows in the data source.

Loading Data

After defining the Privacy object, you can load data from the PrivateDataFrame , using from_df.

op_snsql.from_df(df,privacy,metadata=None) : _PrivateReader

The available parameters of from_dfare the following:

  • df: pandas.DataFrame | op_pandas.PrivateDataFrame: The dataframe to be loaded.
  • privacy: op_snsql.Privacy: A Privacy object with epsilon, delta, and other privacy properties.
  • metadata: Dict: This is ignored when a PrivateDataframe is sent. For a normal DataFrame, you need to send the metadata information with the format used by snsql.

The op_snsql.from_df command returns a private reader class op_snsql._PrivateReader that executes queries against the loaded data.

Executing Queries

After you load a DataSet, you can execute a query to receive differentially private recordsets.

# reader is an object of op_snsql._PrivateReader obtained when you load a dataset.
reader.execute(query_string, postprocess:bool=True)

The available parameters of reader.execute are the following:

  • query_string: A query string in SQL syntax.
  • postprocess: If False, the intermediate result will be returned immediately after adding noise and censoring dimensions. All post-processing that does not impact privacy, such as clamping negative counts, LIMIT, HAVING, and ORDER BY, will be skipped.

This method returns a record structured as an array of tuples, where each tuple represents a row, and each item in the tuple is typed. The first row will contain column names.

Estimating Privacy Budget

The get_privacy_cost method estimates the Epsilon and Delta costs for running a given query. This method doesn’t require running a query or expending the Epsilon or Delta budget.

# reader is an object of op_snsql._PrivateReader obtained when you load a dataset.
reader.get_privacy_cost(query_string)

The available parameters of reader.get_privacy_cost are the following:

  • query_string: The SQL query string or a list of strings to analyse the expected cost.

The get_privacy_cost method returns a tuple of (Epsilon and Delta) estimating the total privacy cost for running the informed queries.