API Reference
PrivateSeries
The PrivateSeries API is based on pandas.Series
, but in this case, all the methods are differentially private. PrivateSeries is available as part of op_pandas
library in Antigranular.
Constructor
The PrivateSeries constructor is as follows:
class op_pandas.PrivateSeries(series = None, metadata = None, categorical_metadata = None)
The PrivateSeries parameters are described below:
series: pandas.Series
: A Pandas Series, with data consisting of only strings, integers, floats, and booleans.metadata: Tuple(float,float)
: Metadata containing the bounds of the given Series. The metadata should be in the following form:(bound_low, bound_hi)
.
If the Series contains string data, the metadata should not be provided.
categorical_metadata: List
: Metadata containing information about the categorical data of the given Series. Thecategorical_metadata
should be a list containing all the categories in the Series. The data types for all the elements in the list must be the same.
The code blocks below present two distinct examples of PrivateSeries:
Series : [10, 20, 30, 40, 10, 42, 54]
metadata : (0, 60)
categorical_metadata : None
Series : ["a", "b", "a", "b", "a", "a"]
metadata : None
categorical_metadata: ["a", "b"]
General Functions
PrivateSeries provides several internal functions you can use when working with series. The PrivateSeries general functions include:
categorical_metadata
This method returns the categorical_metadata of the PrivateSeries
PrivateSeries.categorical_metadata -> List
copy
The copy()
method returns a copy of the PrivateSeries.
PrivateSeries.copy() -> PrivateSeries
describe
The describe()
method returns a statistical description of the data in the DataFrame.
PrivateSeries.describe(eps, percentiles = None, include = None, exclude = None)
The available parameters of describe()
are the following:
eps: float
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.percentiles: list-like of numbers, optional
: The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.include: ‘all’, list-like of dtypes or None (default), optional
: This option is ignored for Series.exclude: list-like of dtypes or None (default), optional
: A blocked list of data types to omit from the result. The available options are as follows:- A list-like of dtypes : Excludes the provided data types from the result.
- To exclude numeric types submit
numpy.number
. - To exclude object columns submit the data type
numpy.object
. - Strings can also be used in the style of
select_dtypes
(e.g.df.describe(exclude=['O'])
). - To exclude pandas’ categorical columns, use
category
.
- To exclude numeric types submit
- None (default): The result will exclude nothing.
- A list-like of dtypes : Excludes the provided data types from the result.
dropna
The dropna()
method missing values within a PrivateSeries..
PrivateSeries.dropna(axis=0)
The available parameters of dropna()
are the following:
axis: boolean {index (0), columns (1)}, default = 0
: Not applied in Series.
dtypes
The dtypes
property returns the data type information of the PrivateSeries.
PrivateSeries.dtypes
isnull
The isnull()
method detects missing values for an array-like object.
PrivateSeries.isnull() -> PrivateSeries:
isna
The isna()
method detects missing values for an array-like object.
PrivateSeries.isna() -> PrivateSeries:
isin
The isin()
checks if each element in the DataFrame is contained in values.
PrivateSeries.isin(values):
The available parameters of isin()
are the following:
values: PrivateDataFrame
: The PrivateDataFrame against which each element in the Series is checked for containment.
make_categorical
This method makes the series categorical.
PrivateSeries.make_categorical(categories, inplace=False):
The available parameters of make_categorical()
are the following:
categories: List
: The categories to be used in the categorical metadata.inplace: bool, default = False
: If True, the operation will modify the data in place.
make_series_non_categorical
This method makes the series noncategorical.
PrivateSeries.make_series_non_categorical(output_bounds: tuple = None, eps: float = 0.0)
The available parameters of make_series_non_categorical()
are the following:
output_bounds: tuple
: When a series contains numerical values but is categorical, this parameter provides output bounds for it. In cases where output bounds for a numerical series aren’t provided, epsilon will be spent to estimate the bounds.eps: float
: The Epsilon to estimate the output bounds of a numerical column.
map
This method maps values of a PrivateSeries according to an input mapping or function.
PrivateSeries.map(arg, eps = 0, output_bounds = None, output_categories = None)
The available parameters of map()
are the following:
arg: callable, mapping, pd.Series or PrivateSeries
: If a mapping (dictionary) and the series have categorical data, all the categories in the metadata must have a mapping.eps: float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
. It’s used to calculate the bonds.output_bounds:Tuple[float, float]
: Inform the output bounds. If not informed, Epsilon will be spend to get estimated bounds of the applied function.output_categories: List
: Inform the output categories if the current series is categorical. If not present, it will be calculated usingarg
.
If the input is a callable, it should return a single value when applied to each element. The output of the callable should be string, int, float, boolean, or datetime.
It's important to note that if the callable is a function, it will execute within an isolated environment with mypy strict mode enabled. The function must adhere to the following constraints:
- The function can only accept one argument, which would be the individual element the function is being applied on.
- Proper type annotations should be present within the function definition. To utilize datetime and regex, import
datetime
andre
to enable their type annotations. For additional examples, access the Pandas quickstart guide.
metadata
The metadata
method returns the metadata/bounds of a numerical series.
PrivateSeries.metadata -> tuple
The code block below presents an example of how to use metadata
:
>> train_x.metadata
(0, 60)
notnull
The notnull()
method detects non-missing values for an array-like object.
PrivateSeries.notnull() -> PrivateSeries:
notna
The notna()
method detect existing (non-missing) values.
PrivateSeries.notna() -> PrivateSeries:
one_hot_encoding
This method performs one-hot encoding on the PrivateSeries.
PrivateSeries.one_hot_encoding(prefix=None, prefix_sep="_") -> PrivateDataFrame:
The available parameters of one_hot_encoding()
are the following:
prefix: str, default None
: Prefix to use for the column names.prefix_sep: str, default "_"
: Separator to use between the prefix and the column name.
rename
This method renames the column name of the PrivateSeries.
PrivateSeries.rename(name:str) -> PrivateSeries
size
The size
method returns the differentially private number of elements in the PrivateSeries
.
PrivateSeries.size(eps: float = 0) -> int:
The available parameters of size()
are the following:
eps: float
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
sample_with_sensitivity
The sample_with_sensitivity()
method returns a random sample of items from the PrivateSeries
,
so that the sensitivity (how many times a user can be present in the dataset) is capped.
PrivateSeries.sample_with_sensitivity(max_sensitivity) -> PrivateSeries:
The available parameters of sample_with_sensitivity()
are the following:
max_sensitivity: int
: The maximum number of times a user can be present in the dataset.
unique
The unique()
method returns the unique values in the PrivateSeries
.
PrivateDataFrame.unique() -> PrivateSeries:
where
The where()
method replaces the values of the rows where the condition evaluates to False.
PrivateSeries.where(cond, other = None,inplace = False, axis = None, level = None)
The available parameters of where()
are the following:
cond: bool PrivateSeries/PrivateDataFrame,Series/DataFrame array-like
: Defines the condition, which should return True or False.- If True, keep the original value.
- If False, replace it with the corresponding value from the other.
other: None
: Currently, other tweaking isn’t supported.inplace: bool, default False
: Indicates whether the operation should modify the data in place.axis: int, default None
: This parameter isn’t used for Series. Defaults to 0.level: int, default None
: Alignment level if needed.
The method returns a PrivateSeries with the result, or None if the inplace
parameter is set to True.
Basic statistical methods
count
The count()
method returns the number of unempty values on the Series.
PrivateSeries.count(eps = 0)
The available parameters of count()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
mean
The mean()
method returns the mean value of the Series.
PrivateSeries.mean(eps = 0)
The available parameters of mean()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
median
The median()
method return the the median value of the values of the Series.
PrivateSeries.median(eps = 0)
The available parameters of median()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
percentile
This method is a differentially private implementation of the percentile method.
PrivateSeries.percentile(p, eps)
The available parameters of percentile()
are the following:
p: float
: The percentile to compute. You must provide a value between 0 and 100.eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
quantile
This method is a differentially private implementation of the quantile method.
PrivateSeries.quantile(q, eps)
The available parameters of `quantile()` are the following:
q: float
: Inform a value between 0 and 1, which is the quantile to compute.eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
standard deviation
The std()
method returns the standard deviation of the sample data.
PrivateSeries.std(eps = 0, ddof = 1)
The available parameters of std()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.ddof: int, default 1
: Delta Degrees of Freedom. The divisor used in calculations is , where N represents the number of elements. Currently, the ddof tweaking is not supported.
sum
The sum()
method adds all values in the Series.
PrivateSeries.sum(eps = 0)
The available parameters of sum()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.
variance
The variance()
method calculates the variance from the Series.
PrivateSeries.var(eps = 0, ddof = 1)
The available parameters of variance()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.ddof: int, default 1
: Delta Degrees of Freedom. The divisor used in calculations is , where N represents the number of elements. Currently, the ddof tweaking is not supported.
Advanced statistical methods
The PrivateSeries basic statical methods include:
covariance
The cov()
method finds the covariance of two PrivateSeries.
PrivateSeries.cov(other, eps: float, min_periods, ddof = 1)
The available parameters of cov()
are the following:
other: PrivateSeries
: The second PrivateSeries.eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.min_periods: int, optional
: By default, 1 is used. Currently,min_periods
tweaking is not supported.ddof: int, default 1
: Delta Degrees of Freedom. The divisor used in calculations is , where N represents the number of elements. Currently, the ddof tweaking is not supported.
skew
The skew()
method calculates the skew for the PrivateSeries.
PrivateSeries.skew(eps, axis = 0, skipna = True, numeric_only = True)
The available parameters of skew()
are the following:
eps : float, default = 0
: The epsilon provided to the differentially private calculation. Theeps
value must be>=0
.axis: boolean {index (0), columns (1)}, default = 0
: Axis for the function to be applied on.skipna: bool, default True
: Exclude NA/Null values when computing the result.numeric_only: bool, default None
: Include only float, int, and boolean columns. Ifaxis = 0
,numeric_only
is always assumed to be True. Otherwise, you must specify a value.
Histograms
hist
This method draws a a histogram of the PrivateSeries.
PrivateSeries.hist(eps, bins = 10)
The available parameters of hist()
are the following:
eps: float
: Inform the epsilon provided to the differentially private calculation. Theeps
value must be>=0
.bins: int, default 10
: Number of histogram bins to be used.
hist2d
This method creates a 2d histograma of two PrivateSeries.
PrivateSeries.hist2d(other, eps, bins = 10)
The available parameters of hist2d()
are the following:
other: PrivateSeries
: The second PrivateSeries.eps: float
: Inform the epsilon provided to the differentially private calculation. Theeps
value must be>=0
.bins: int, default 10
: Number of histogram bins to be used.
The PrivateSeries API is based on pandas.Series
, but in this case, all the methods are differentially private. PrivateSeries is available as part of op_pandas
library in Antigranular.