Skip to main content

API Reference

PrivateSeries

The PrivateSeries API is based on pandas.Series, but in this case, all the methods are differentially private. PrivateSeries is available as part of op_pandas library in Antigranular.

Constructor

The PrivateSeries constructor is as follows:

class op_pandas.PrivateSeries(series = None, metadata = None, categorical_metadata = None)

The PrivateSeries parameters are described below:

  • series: pandas.Series: A Pandas Series, with data consisting of only strings, integers, floats, and booleans.
  • metadata: Tuple(float,float): Metadata containing the bounds of the given Series. The metadata should be in the following form: (bound_low, bound_hi).
note

If the Series contains string data, the metadata should not be provided.

  • categorical_metadata: List: Metadata containing information about the categorical data of the given Series. The categorical_metadata should be a list containing all the categories in the Series. The data types for all the elements in the list must be the same.

The code blocks below present two distinct examples of PrivateSeries:

    Series : [10, 20, 30, 40, 10, 42, 54]
metadata : (0, 60)
categorical_metadata : None
    Series : ["a", "b", "a", "b", "a", "a"]
metadata : None
categorical_metadata: ["a", "b"]

General Functions

PrivateSeries provides several internal functions you can use when working with series. The PrivateSeries general functions include:

categorical_metadata

This method returns the categorical_metadata of the PrivateSeries

PrivateSeries.categorical_metadata -> List

copy

The copy() method returns a copy of the PrivateSeries.

PrivateSeries.copy() -> PrivateSeries

describe

The describe() method returns a statistical description of the data in the DataFrame.

PrivateSeries.describe(eps, percentiles = None, include = None, exclude = None)

The available parameters of describe() are the following:

  • eps: float: The epsilon provided to the differentially private calculation. The eps value must be >=0.
  • percentiles: list-like of numbers, optional: The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
  • include: ‘all’, list-like of dtypes or None (default), optional: This option is ignored for Series.
  • exclude: list-like of dtypes or None (default), optional: A blocked list of data types to omit from the result. The available options are as follows:
    • A list-like of dtypes : Excludes the provided data types from the result.
      • To exclude numeric types submit numpy.number.
      • To exclude object columns submit the data type numpy.object.
      • Strings can also be used in the style of select_dtypes (e.g. df.describe(exclude=['O'])).
      • To exclude pandas’ categorical columns, use category.
    • None (default): The result will exclude nothing.

dropna

The dropna() method missing values within a PrivateSeries..

PrivateSeries.dropna(axis=0)

The available parameters of dropna() are the following:

  • axis: boolean {index (0), columns (1)}, default = 0: Not applied in Series.

dtypes

The dtypes property returns the data type information of the PrivateSeries.

PrivateSeries.dtypes

isnull

The isnull() method detects missing values for an array-like object.

PrivateSeries.isnull() -> PrivateSeries:

isna

The isna() method detects missing values for an array-like object.

PrivateSeries.isna() -> PrivateSeries:

isin

The isin() checks if each element in the DataFrame is contained in values.

PrivateSeries.isin(values):

The available parameters of isin() are the following:

  • values: PrivateDataFrame: The PrivateDataFrame against which each element in the Series is checked for containment.

make_categorical

This method makes the series categorical.

PrivateSeries.make_categorical(categories, inplace=False):

The available parameters of make_categorical() are the following:

  • categories: List: The categories to be used in the categorical metadata.
  • inplace: bool, default = False: If True, the operation will modify the data in place.

make_series_non_categorical

This method makes the series noncategorical.

PrivateSeries.make_series_non_categorical(output_bounds: tuple = None, eps: float = 0.0)

The available parameters of make_series_non_categorical() are the following:

  • output_bounds: tuple: When a series contains numerical values but is categorical, this parameter provides output bounds for it. In cases where output bounds for a numerical series aren’t provided, epsilon will be spent to estimate the bounds.
  • eps: float: The Epsilon to estimate the output bounds of a numerical column.

map

This method maps values of a PrivateSeries according to an input mapping or function.

PrivateSeries.map(arg, eps = 0, output_bounds = None, output_categories = None)

The available parameters of map() are the following:

  • arg: callable, mapping, pd.Series or PrivateSeries: If a mapping (dictionary) and the series have categorical data, all the categories in the metadata must have a mapping.
  • eps: float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0. It’s used to calculate the bonds.
  • output_bounds:Tuple[float, float]: Inform the output bounds. If not informed, Epsilon will be spend to get estimated bounds of the applied function.
  • output_categories: List: Inform the output categories if the current series is categorical. If not present, it will be calculated using arg.
info

If the input is a callable, it should return a single value when applied to each element. The output of the callable should be string, int, float, boolean, or datetime.

It's important to note that if the callable is a function, it will execute within an isolated environment with mypy strict mode enabled. The function must adhere to the following constraints:

  • The function can only accept one argument, which would be the individual element the function is being applied on.
  • Proper type annotations should be present within the function definition. To utilize datetime and regex, import datetime and re to enable their type annotations. For additional examples, access the Pandas quickstart guide.

metadata

The metadata method returns the metadata/bounds of a numerical series.

PrivateSeries.metadata -> tuple

The code block below presents an example of how to use metadata :

>> train_x.metadata

(0, 60)

notnull

The notnull() method detects non-missing values for an array-like object.

PrivateSeries.notnull() -> PrivateSeries:

notna

The notna() method detect existing (non-missing) values.

PrivateSeries.notna() -> PrivateSeries:

one_hot_encoding

This method performs one-hot encoding on the PrivateSeries.

PrivateSeries.one_hot_encoding(prefix=None, prefix_sep="_") -> PrivateDataFrame:

The available parameters of one_hot_encoding() are the following:

  • prefix: str, default None: Prefix to use for the column names.
  • prefix_sep: str, default "_": Separator to use between the prefix and the column name.

rename

This method renames the column name of the PrivateSeries.

PrivateSeries.rename(name:str) -> PrivateSeries

size

The size method returns the differentially private number of elements in the PrivateSeries.

PrivateSeries.size(eps: float = 0) -> int:

The available parameters of size() are the following:

  • eps: float: The epsilon provided to the differentially private calculation. The eps value must be >=0.

sample_with_sensitivity

The sample_with_sensitivity() method returns a random sample of items from the PrivateSeries, so that the sensitivity (how many times a user can be present in the dataset) is capped.

PrivateSeries.sample_with_sensitivity(max_sensitivity) -> PrivateSeries:

The available parameters of sample_with_sensitivity() are the following:

  • max_sensitivity: int: The maximum number of times a user can be present in the dataset.

unique

The unique() method returns the unique values in the PrivateSeries.

PrivateDataFrame.unique() -> PrivateSeries:

where

The where() method replaces the values of the rows where the condition evaluates to False.

PrivateSeries.where(cond, other = None,inplace = False, axis = None, level = None)

The available parameters of where() are the following:

  • cond: bool PrivateSeries/PrivateDataFrame,Series/DataFrame array-like: Defines the condition, which should return True or False.
    • If True, keep the original value.
    • If False, replace it with the corresponding value from the other.
  • other: None: Currently, other tweaking isn’t supported.
  • inplace: bool, default False: Indicates whether the operation should modify the data in place.
  • axis: int, default None: This parameter isn’t used for Series. Defaults to 0.
  • level: int, default None: Alignment level if needed.

The method returns a PrivateSeries with the result, or None if the inplace parameter is set to True.

Basic statistical methods

count

The count() method returns the number of unempty values on the Series.

PrivateSeries.count(eps = 0)

The available parameters of count() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.

mean

The mean() method returns the mean value of the Series.

PrivateSeries.mean(eps = 0)

The available parameters of mean() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.

median

The median() method return the the median value of the values of the Series.

PrivateSeries.median(eps = 0)

The available parameters of median() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.

percentile

This method is a differentially private implementation of the percentile method.

PrivateSeries.percentile(p, eps)

The available parameters of percentile() are the following:

  • p: float: The percentile to compute. You must provide a value between 0 and 100.
  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.

quantile

This method is a differentially private implementation of the quantile method.

PrivateSeries.quantile(q, eps)

The available parameters of `quantile()` are the following:

  • q: float: Inform a value between 0 and 1, which is the quantile to compute.
  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.

standard deviation

The std() method returns the standard deviation of the sample data.

PrivateSeries.std(eps = 0, ddof = 1)

The available parameters of std() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.
  • ddof: int, default 1: Delta Degrees of Freedom. The divisor used in calculations is NddofN - ddof, where N represents the number of elements. Currently, the ddof tweaking is not supported.

sum

The sum() method adds all values in the Series.

PrivateSeries.sum(eps = 0)

The available parameters of sum() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.

variance

The variance() method calculates the variance from the Series.

PrivateSeries.var(eps = 0, ddof = 1)

The available parameters of variance() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.
  • ddof: int, default 1: Delta Degrees of Freedom. The divisor used in calculations is NddofN - ddof, where N represents the number of elements. Currently, the ddof tweaking is not supported.

Advanced statistical methods

The PrivateSeries basic statical methods include:

covariance

The cov() method finds the covariance of two PrivateSeries.

PrivateSeries.cov(other, eps: float, min_periods, ddof = 1)

The available parameters of cov() are the following:

  • other: PrivateSeries: The second PrivateSeries.
  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.
  • min_periods: int, optional: By default, 1 is used. Currently, min_periods tweaking is not supported.
  • ddof: int, default 1: Delta Degrees of Freedom. The divisor used in calculations is NddofN - ddof, where N represents the number of elements. Currently, the ddof tweaking is not supported.

skew

The skew() method calculates the skew for the PrivateSeries.

PrivateSeries.skew(eps, axis = 0, skipna = True, numeric_only = True)

The available parameters of skew() are the following:

  • eps : float, default = 0: The epsilon provided to the differentially private calculation. The eps value must be >=0.
  • axis: boolean {index (0), columns (1)}, default = 0: Axis for the function to be applied on.
  • skipna: bool, default True: Exclude NA/Null values when computing the result.
  • numeric_only: bool, default None: Include only float, int, and boolean columns. If axis = 0, numeric_only is always assumed to be True. Otherwise, you must specify a value.

Histograms

hist

This method draws a a histogram of the PrivateSeries.

PrivateSeries.hist(eps, bins = 10)

The available parameters of hist() are the following:

  • eps: float: Inform the epsilon provided to the differentially private calculation. The eps value must be >=0.
  • bins: int, default 10: Number of histogram bins to be used.

hist2d

This method creates a 2d histograma of two PrivateSeries.

PrivateSeries.hist2d(other, eps, bins = 10)

The available parameters of hist2d() are the following:

  • other: PrivateSeries: The second PrivateSeries.
  • eps: float: Inform the epsilon provided to the differentially private calculation. The eps value must be >=0.
  • bins: int, default 10: Number of histogram bins to be used.

The PrivateSeries API is based on pandas.Series, but in this case, all the methods are differentially private. PrivateSeries is available as part of op_pandas library in Antigranular.