API Reference
TensorFlow Privacy
TensorFlow Privacy is a Python library developed by Google that enables training of machine learning models with privacy guarantees, in particular through the implementation of differential privacy. It allows developers to apply differential privacy techniques to protect the training data's privacy without significantly compromising the model's accuracy. This is particularly useful in scenarios where sensitive data is used, ensuring that the model does not inadvertently reveal any specific details about the individuals represented in the training data.
We have developed a custom library, op_tensorflow
, designed to integrate tensorflow-privacy
with the private data objects available in op_pandas
. Users familiar with TensorFlow and Keras will find the APIs of our library intuitive and easy to use.
The below sections contain the in-depth api references. To understand the usage flow of op_tensorflow
, checkout the following guide.
PrivateDataLoader
We use PrivateDataLoader
, a data generator function which enables private objects from op_pandas
to be used seamlessly with PrivateKerasModel
- The data loader internally uses Poisson-sampling which significantly reduces the privacy costs.
- The sampling rate used in PrivateDataLoader depends on the input
batch_size
andsample_size
of the features and labels.
from op_tensorflow import PrivateDataLoader
class PrivateDataLoader(
feature_df: PrivateDataFrame | PrivateSeries
label_df: PrivateDataFrame | PrivateSeries
batch_size: int
)
feature_df
: Private data object containing the feature data.label_df
: Private data object containing the label data.batch_size
: Size of the batch that will be used while use the dataloader for training.
PrivateKerasModel
PrivateKerasModel
is a wrapper class around tensorflow.keras
which converts any standard keras model to easily integrate with op_pandas
and leverage differentially private training of models.
from op_tensorflow import PrivateKerasModel
class PrivateKerasModel(
model: Keras.Model | Keras.Sequential,
l2_norm_clip : float,
noise_multiplier : float
)
model
: Any standard keras model.l2_norm_clip
: Clipping threshold for l2_norms during model training.noise_multiplier
: The noise_multiplier parameter in TensorFlow Privacy plays a critical role in the implementation of differential privacy (DP) for machine learning models. It directly influences the amount of noise that is added to the gradients during the training process to ensure the model's outputs do not compromise the privacy of the data used for training.
fit
Trains the model for a fixed number of epochs (dataset iterations) in a differentially private manner using DP-SGD.
PrivateKerasModel.fit(
x : PrivateDataLoader | PrivateDataFrame | pandas.DataFrame,
y = None : PrivateDataFrame | pandas.DataFrame ,
batch_size=32,
epochs=1,
target_delta=1e-5,
*args,
**kwargs
):
data
: There are 2 combinations via which data can be feeded.- x = PrivateDataLoader and y = None
- x and y are
op_pandas
/pandas
data objects.
batch_size
: The batch size needed for each batch in one epochepochs
: Number of training iterationstarget_delta
: The target delta that you want to acheieve for the entire training process.*args | **kwargs
: Refer here for the complete list of arguments.
evaluate
Returns the loss value & metrics values for the model in test mode.
PrivateKerasModel.evaluate(
x : PrivateDataLoader | PrivateDataFrame | pandas.DataFrame,
y=None : PrivateDataFrame | pandas.DataFrame ,
*args,
**kwargs
):
data
: There are 2 combinations via which data can be feeded.- x = PrivateDataLoader and y = None
- x and y are
op_pandas
/pandas
data objects. *args | **kwargs
: Refer here for the complete list of arguments.
predict
Generates output predictions for the input samples.
PrivateKerasModel.predict(
x : PrivateDataLoader | PrivateDataFrame | pandas.DataFrame,
y=None : PrivateDataFrame | pandas.DataFrame ,
*args,
**kwargs
):
data
: There are 2 combinations via which data can be feeded.- x = PrivateDataLoader and y = None
- x and y are
op_pandas
/pandas
data objects. *args | **kwargs
: Refer here for the complete list of arguments.
Blocked methods
The below methods are restricted mainly because they could cause side-effects via file/network access. Some other methods are restricted as they are depreciated from the main repository.
- Depreciated
- "train_on_batch"
- "test_on_batch"
- "predict_on_batch"
- "fit_generator"
- "evaluate_generator"
- "predict_generator"
- "train_step"
- "test_step"
- "predict_step"
- File IO
- "load_weights",
- "save_weights",
- "save",
- "load_own_variables",
- "save_own_variables",
- "export"
get_privacy_budget
Utility function to estimate the epsilon needed for training a model based on various model parameters and target_delta
.
from op_tensorflow import get_privacy_budget
get_privacy_budget(
sample_size : int ,
batch_size : int,
num_epochs : int,
noise_multiplier : float,
target_delta : float
):