Guides
SmartNoise SQL
SmartNoise SQL is part the SmartNoise SDK and enables privacy-preserving querying of sensitive databases using SQL. It incorporates differential privacy techniques to add noise to query results, ensuring individual privacy while maintaining statistical validity. SmartNoise SQL allows researchers and analysts to extract valuable insights from sensitive data without compromising confidentiality.
Antigranular wraps the snsql
library to support SQL queries on PrivateDataframe
with minimal change to the existing function signatures and methods. As a result, the transition is as smooth as possible for those who have worked with SmartNoise SQL before.
To use the SmartNoise SQL, you need to import it as presented in the following code block:
%%ag
import op_snsql
Usage
For this example, we will perform an SQL query on the Adult Population Dataset. You can connect to it using your own credentials and logging in to antigranular, or you can also use your own dataframe.
%%ag
# Defining the PrivateDataFrame
df = adult_population_dataset["train_x"]
ag_print(df.metadata)
The Privacy and Reader objects are the primary APIs used by SmartNoise SQL. You can use op_snsql.Privacy
to define a Privacy object and set epsilon
and delta
parameters, while op_snsql.from_df
will use a PrivateDataFrame
and a op_snsql.Privacy
object to create a reader that can be queried.
%%ag
import op_snsql
import pandas as pd
# Define a Privacy object and set epsilon, delta parameters that
# will apply to every query
privacy = op_snsql.Privacy(epsilon=1.0, delta=0.01)
# Define a reader from both your PDF and the Privacy object
reader = op_snsql.from_df(df, privacy=privacy)
You can perform SQL queries as usual using the execute
API on a Reader
object. For Antigranular datasets, the dataset is accesible through df.table
.
%%ag
# Execute your SQL query on this dataset
result = reader.execute("SELECT AVG(age) FROM df.table")
ag_print(result)
You can also estimate the epsilon
and delta
expenditure of any given query before performing it using get_privacy_cost
. This method does not run the query or spend anything from the budget.
%%ag
# Recieve an (epsilon, delta) expected cost tuple
cost = reader.get_privacy_cost("SELECT AVG(age) FROM df.table")
ag_print(cost)
API Reference
Currently, AG supports all the methods and classes within the following directories:
Resources
Below is a list of additional resources useful when using SmartNoise SQL within Antigranular.
- Official Smartnoise-SQL Documentation
- GitHub repository: Learn more about how differentially private SQL queries are executed.