Skip to main content

Guides

SmartNoise SQL

SmartNoise SQL is part the SmartNoise SDK and enables privacy-preserving querying of sensitive databases using SQL. It incorporates differential privacy techniques to add noise to query results, ensuring individual privacy while maintaining statistical validity. SmartNoise SQL allows researchers and analysts to extract valuable insights from sensitive data without compromising confidentiality.

Antigranular wraps the snsql library to support SQL queries on PrivateDataframe with minimal change to the existing function signatures and methods. As a result, the transition is as smooth as possible for those who have worked with SmartNoise SQL before.

To use the SmartNoise SQL, you need to import it as presented in the following code block:

%%ag
import op_snsql

Usage

For this example, we will perform an SQL query on the Adult Population Dataset. You can connect to it using your own credentials and logging in to antigranular, or you can also use your own dataframe.

%%ag
# Defining the PrivateDataFrame
df = adult_population_dataset["train_x"]
ag_print(df.metadata)

The Privacy and Reader objects are the primary APIs used by SmartNoise SQL. You can use op_snsql.Privacy to define a Privacy object and set epsilon and delta parameters, while op_snsql.from_df will use a PrivateDataFrame and a op_snsql.Privacy object to create a reader that can be queried.

%%ag
import op_snsql
import pandas as pd

# Define a Privacy object and set epsilon, delta parameters that
# will apply to every query
privacy = op_snsql.Privacy(epsilon=1.0, delta=0.01)

# Define a reader from both your PDF and the Privacy object
reader = op_snsql.from_df(df, privacy=privacy)

You can perform SQL queries as usual using the execute API on a Reader object. For Antigranular datasets, the dataset is accesible through df.table.

%%ag
# Execute your SQL query on this dataset
result = reader.execute("SELECT AVG(age) FROM df.table")
ag_print(result)

You can also estimate the epsilon and delta expenditure of any given query before performing it using get_privacy_cost. This method does not run the query or spend anything from the budget.

%%ag
# Recieve an (epsilon, delta) expected cost tuple
cost = reader.get_privacy_cost("SELECT AVG(age) FROM df.table")
ag_print(cost)

API Reference

Currently, AG supports all the methods and classes within the following directories:

Resources

Below is a list of additional resources useful when using SmartNoise SQL within Antigranular.