Skip to main content

Guides

TensorFlow Privacy

TensorFlow Privacy is a Python library developed by Google that enables training of machine learning models with privacy guarantees, in particular through the implementation of differential privacy. It allows developers to apply differential privacy techniques to protect the training data's privacy without significantly compromising the model's accuracy. This is particularly useful in scenarios where sensitive data is used, ensuring that the model does not inadvertently reveal any specific details about the individuals represented in the training data.

We have developed a custom library, op_tensorflow, designed to integrate tensorflow-privacy with the private data objects available in op_pandas. Users familiar with TensorFlow and Keras will find the APIs of our library intuitive and easy to use.

note

Tensorflow Privacy is currently in its beta phase, and only Sequential models can be created in Antigranular environment. However , you can load any locally created model using the following approach.

Importing the library

To use the op_tensorflow, you need to import the library as presented in the following code block:

%%ag
import op_tensorflow

Creating DP Models

op_tensorflow makes the traiing of a differentially private model as seamless as possible with the following features

  • It leaverages PrivateKerasModel — a wrapper class around tensorflow.keras which converts any standard keras model to adhere to differentially private training.
  • The training is done using a differentially private version of stochastic gradient descent (DP-SGD).
  • Differentially private model that is created works seamlessly with all the optimizers and loss functions available in tensorflow.keras.
from op_tensorflow import PrivateKerasModel
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Normal keras model
seqM = Sequential(
[
Dense(16, activation="relu", input_shape=(2,)),
Dense(8, activation="relu"),
Dense(1, activation="sigmoid"),
]
)

# create DP keras model
dp_model = PrivateKerasModel(model = seqM, l2_norm_clip=1, noise_multiplier=1.2)

# Use a standard (non-DP) optimizer directly from keras.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

# Use a standard (non-DP) loss function directly from keras.
loss = tf.keras.losses.MeanSquaredError()

# PrivateKerasModel uses similar API as standard Keras
dp_model.compile(
optimizer = optimizer,
loss = loss,
metrics = ["accuracy"]
)

Training DP Models

We use PrivateDataLoader, a data generator function which enables private objects from op_pandas to be used with PrivateKerasModel

  • The data loader internally uses Poisson sampling which significantly reduces the privacy costs.
  • The sampling rate used in PrivateDataLoader depends on the input batch_size and sample_size of the features and labels.
%%ag
from op_tensorflow import PrivateDataLoader
# pdf is a PrivateDataFrame object.
X = pdf[["feature1", "feature2"]]
y = pdf["target"]

data_loader = PrivateDataLoader(feature_df=X , label_df=y, batch_size=4)

Once the data_loader is created, we can directly send this onto the DP Model we created earlier with the dp_model.fit method.

%%ag
dp_model.fit(x=data_loader , epochs = 20 , target_delta = 1e-5)
"""
model.fit ->
noise_multiplier = 1.2 ( from private_keras_model )
batch_size = 4 ( from private_data_loader )
epochs = 20 ( from fit function )
target_delta = 1e-5 ( from fit function )
"""

Estimating privacy budgets

Before training, it is advised to get an estimate of the epsilon that would be required for the training parameters of your model. This would avoid undesirable suprises of accidently exhausting your budgets.

%%ag
from op_tensorflow import get_privacy_budget
get_privacy_budget(
sample_size=100000,
batch_size=32,
num_epochs=1000,
noise_multiplier=1.5,
target_delta=1e-5,
)
OUTPUT
=> EPSILON_REQUIRED = 0.2778802396005823 using TARGET_DELTA = 1e-05
Training parameters used :-
SAMPLE_SIZE = 100000
BATCH_SIZE = 4
NUM_EPOCHS = 20
NOISE_MULTIPLIER = 1.2

Importing locally trained models

You can save your local model configs into a dict/str which can be sent from your local environment to AG using private_import method.

model_info = local_model.to_json()
weights = local_model.get_weights()
session.private_import(data=model_info, name='model_info')
session.private_import(data=weights , name='weights')

Now you can use this private imported info to create a PrivateKerasModel in AG environment.

%%ag
import tensorflow
import op_tensorflow

model = tensorflow.keras.models.model_from_json(model_info)
model.set_weights(weights)

dp_model = PrivateKerasModel(model = model, l2_norm_clip=2, noise_multiplier=1.5)

Predicting results from model

You can consider a model to be a complex mathematical transformation function like a blackbox which gives an output based on some input.

  • The DP trained weights are used to predict the output for a particular input using model.predict()
  • Since it is a transformation function, private inputs generates private outputs.
# test_pdf is a slice of the feature_df ( obtained using op_pandas.test_train_split )
# label_columns basically sets a name for the column in the result_pdf ,
# If not mentioned , it will just assign (0,1,2 ...) for each label.
result_pdf = dp_model.predict(x = test_pdf , label_columns=["result"])

Resources

For detailed information about TensorFlow Privacy's function signatures and methods, refer to the official TensorFlow Privacy documentation.