API Reference
Opacus
Opacus is a library developed by Meta allowing for differentially private training of PyTorch models.
We have developed a custom library, op_opacus
, designed to integrate opacus
with the private data objects available in op_pandas
. Users familiar with PyTorch and Opacus will find the APIs of our library intuitive and easy to use. Below you can find the details of the methods and classes supported by op_opacus
.
PrivateDPDataLoader
This is the counterpart of DPDataLoader within Opacus. It allows for the creation of DataLoaders from PrivateDataFrames for training purposes.
During the training step, Poisson sampling is utilised to select samples from the DataLoader.
op_opacus.PrivateDPDataLoader.from_private_dataframe(
dataset: Union[
PrivateDataFrame,
PrivateSeries,
List[Union[PrivateDataFrame, PrivateSeries]],
],
dtypes: Any | List[Any] = None,
batch_size: int = 1,
num_workers=0,
pin_memory=False,
drop_last=False,
timeout=0,
multiprocessing_context=None,
generator=None,
prefetch_factor=2,
persistent_workers=False,
pin_memory_device="",
) -> PrivateDPDataLoader:
The available parameters of from_private_dataframe
include:
dataset
: The dataset or list of datasets to be used for creating the DataLoader.dtypes
: The DataType of each PrivateDataFrame, which are necessary for converting the respective datasets into tensors.
For details on the remaining arguments, please refer to torch.utils.data.DataLoader.
PrivacyEngine
The main entry point to the op_opacus API is through the PrivacyEngine
, which enables differential privacy during model training.
class op_opacus.PrivacyEngine(accountant: str = "prv")
accountant
: Accounting mechanism. Currently supported: “rdp”, “prv” and “gdp”.
PrivacyEngine.make_private_with_epsilon(
module: nn.Module,
optimizer: optim.Optimizer,
data_loader: PrivateDPDataLoader,
target_epsilon: float,
target_delta: float,
epochs: int,
max_grad_norm: Union[float, List[float]],
batch_first: bool = True,
loss_reduction: str = "mean",
poisson_sampling: bool = True,
clipping: str = "flat",
noise_generator=None,
grad_sample_mode: str = "hooks",
):
This API attaches the module, optimiser and dataloader to the PrivacyEngine, thereby making them Differentially Private.
It computes the privacy parameters according to a specified Privacy Budget. For additional information, refer to opacus.PrivacyEngine.make_private_with_epsilon.
PrivateLoss
Within op_opacus, it is crucial to privatise loss objects. This allows for the calculation of average loss per epoch during model training
op_opacus.make_loss_private(LossClass):
LossClass
: Class definition of pytorch losses.
Example:
%%ag
from torch import nn
PrivateCrossEntropyLoss = make_loss_private(nn.CrossEntropyLoss)
loss_function = PrivateCrossEntropyLoss() # so that per epoch average loss can be shown
TrainModel
A helper class facilitates the training of a PyTorch model in a differentially private manner:
class op_opacus.TrainModel(privacy_engine: PrivacyEngine, loss_function)
privacy_engine
: Instance of PrivacyEngine class of op_opacus, which encompasses the DP module, optimiser, and DataLoader.loss_function
: The loss function utilised for model training.
TrainModel.train(train_callable: Callable, verbose: int = 1, include_nan_in_loss: bool = False)
This API is used to train the model through:
train_callable
: This is the function which contain the training logic. It can be called using the following arguments:train_callable(model, optimizer, data_loader_batch, loss_function)
verbose
: Sets the verbosity level (0, 1, or 2)- If verbose is 0, nothing will be printed while the model is being trained.
- If verbose is 1, the epoch number along with the Privacy Budget spent till now will be shown.
- If verbose is 2, epoch number, privacy budget spent till now, the average loss for this epoch (only if the
loss_function
is made private) and time taken for this epoch will be shown.
include_nan_in_loss
: boolean indicating whether NaNs should be included in the average loss computation per epoch, applicable whenverbose
is set to 2.
ApplyModel
A helper function for obtaining model predictions within the torch.no_grad
context.
class op_opacus.ApplyModel(model: nn.Module, privacy_engine: PrivacyEngine)
The user has the option to send a PyTorch model or instance of op_opacus PrivacyEngine
class. This will be used to get the predictions.
ApplyModel.apply_model_private(private_data: PrivateDataFrame | PrivateSeries,
dtype=None,
output_col_names: list=None,
eps: float = 0.0,
output_bounds: dict = None) -> PrivateDataFrame:
private_data
: Private data to be sent as input to get the predictions of the model.dtype
: Data type of the private data used to create tensors.output_col_names
: This is used to name the columns of output PrivateDataFrame. By default, the columns are be namedcol_{i}
, where{i}
is theith
column.eps
: This is the Privacy Budget used to calculate the bounds of the output PrivateDataFrame.output_bounds
: A dictionary of the type:{’column_name’: (min_bound, max_bound)}
, containing metadata of the columns for which bounds are already known. No epsilon is spent to calculate the bounds for these columns.
ApplyModel.apply_model_public(data: DataFrame,
dtype=None) -> DataFrame:
data
: Public dataframedtype
: Data type of the public data used to create tensors.
Example:
%%ag
test_model = ApplyModel(privacy_engine=privacy_engine)
out = test_model.apply_model_private(
test_x, dtype=torch.float, output_col_names=["Iris-setosa", "Iris-versicolor", "Iris-virginica"],eps=1)