Guides
TensorFlow Privacy
TensorFlow Privacy is a Python library developed by Google that enables training of machine learning models with privacy guarantees, in particular through the implementation of differential privacy. It allows developers to apply differential privacy techniques to protect the training data's privacy without significantly compromising the model's accuracy. This is particularly useful in scenarios where sensitive data is used, ensuring that the model does not inadvertently reveal any specific details about the individuals represented in the training data.
We have developed a custom library, op_tensorflow
, designed to integrate tensorflow-privacy
with the private data objects available in op_pandas
. Users familiar with TensorFlow and Keras will find the APIs of our library intuitive and easy to use.
Tensorflow Privacy is currently in its beta phase, and only Sequential models can be created in Antigranular environment. However , you can load any locally created model using the following approach.
Importing the library
To use the op_tensorflow
, you need to import the library as presented in the following code block:
%%ag
import op_tensorflow
Creating DP Models
op_tensorflow
makes the traiing of a differentially private model as seamless as possible with the following features
- It leaverages
PrivateKerasModel
— a wrapper class aroundtensorflow.keras
which converts any standard keras model to adhere to differentially private training. - The training is done using a differentially private version of stochastic gradient descent (DP-SGD).
- Differentially private model that is created works seamlessly with all the optimizers and loss functions available in
tensorflow.keras
.
from op_tensorflow import PrivateKerasModel
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Normal keras model
seqM = Sequential(
[
Dense(16, activation="relu", input_shape=(2,)),
Dense(8, activation="relu"),
Dense(1, activation="sigmoid"),
]
)
# create DP keras model
dp_model = PrivateKerasModel(model = seqM, l2_norm_clip=1, noise_multiplier=1.2)
# Use a standard (non-DP) optimizer directly from keras.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
# Use a standard (non-DP) loss function directly from keras.
loss = tf.keras.losses.MeanSquaredError()
# PrivateKerasModel uses similar API as standard Keras
dp_model.compile(
optimizer = optimizer,
loss = loss,
metrics = ["accuracy"]
)
Training DP Models
We use PrivateDataLoader
, a data generator function which enables private objects from op_pandas
to be used with PrivateKerasModel
- The data loader internally uses Poisson sampling which significantly reduces the privacy costs.
- The sampling rate used in PrivateDataLoader depends on the input
batch_size
andsample_size
of the features and labels.
%%ag
from op_tensorflow import PrivateDataLoader
# pdf is a PrivateDataFrame object.
X = pdf[["feature1", "feature2"]]
y = pdf["target"]
data_loader = PrivateDataLoader(feature_df=X , label_df=y, batch_size=4)
Once the data_loader is created, we can directly send this onto the DP Model we created earlier with the dp_model.fit
method.
%%ag
dp_model.fit(x=data_loader , epochs = 20 , target_delta = 1e-5)
"""
model.fit ->
noise_multiplier = 1.2 ( from private_keras_model )
batch_size = 4 ( from private_data_loader )
epochs = 20 ( from fit function )
target_delta = 1e-5 ( from fit function )
"""
Estimating privacy budgets
Before training, it is advised to get an estimate of the epsilon
that would be required for the training parameters of your model.
This would avoid undesirable suprises of accidently exhausting your budgets.
%%ag
from op_tensorflow import get_privacy_budget
get_privacy_budget(
sample_size=100000,
batch_size=32,
num_epochs=1000,
noise_multiplier=1.5,
target_delta=1e-5,
)
OUTPUT
=> EPSILON_REQUIRED = 0.2778802396005823 using TARGET_DELTA = 1e-05
Training parameters used :-
SAMPLE_SIZE = 100000
BATCH_SIZE = 4
NUM_EPOCHS = 20
NOISE_MULTIPLIER = 1.2
Importing locally trained models
You can save your local model configs into a dict/str which can be sent from your local environment to AG using private_import
method.
model_info = local_model.to_json()
weights = local_model.get_weights()
session.private_import(data=model_info, name='model_info')
session.private_import(data=weights , name='weights')
Now you can use this private imported info to create a PrivateKerasModel in AG environment.
%%ag
import tensorflow
import op_tensorflow
model = tensorflow.keras.models.model_from_json(model_info)
model.set_weights(weights)
dp_model = PrivateKerasModel(model = model, l2_norm_clip=2, noise_multiplier=1.5)
Predicting results from model
You can consider a model to be a complex mathematical transformation function like a blackbox which gives an output based on some input.
- The DP trained weights are used to predict the output for a particular input using
model.predict()
- Since it is a transformation function, private inputs generates private outputs.
# test_pdf is a slice of the feature_df ( obtained using op_pandas.test_train_split )
# label_columns basically sets a name for the column in the result_pdf ,
# If not mentioned , it will just assign (0,1,2 ...) for each label.
result_pdf = dp_model.predict(x = test_pdf , label_columns=["result"])
Resources
For detailed information about TensorFlow Privacy's function signatures and methods, refer to the official TensorFlow Privacy documentation.