Skip to main content

Pandas

Pandas is a popular open-source data manipulation and analysis library for Python. It provides data structures and functions to efficiently handle and manipulate structured data, such as tables or time series. Pandas offers powerful tools for data cleaning, transformation, filtering, merging, and aggregation. It is widely used in data science, machine learning, and other domains for data preprocessing and analysis tasks, making it a valuable tool for working with structured data in Python.

We have created a differentially private version of pandas library op_pandas. This lets you handle private dataframes and private series which can be used to perform various statistical analysis with differential privacy guarantees. The api methods have been created such that it gives you minimal difficulty from getting adjusted if you have used pandas before.

Import the library as follows:

%%ag
from op_pandas import PrivateDataFrame , PrivateSeries

Important Point to note:

We have enabled caching for op_pandas library.

Through caching, we aim to optimize epsilon spend for everyone. Let us say we want to describe a PrivateDataFrame with epsilon = 1. First, you can run the describe method, and it will cost 1 epsilon. If you run the same function again with epsilon <= 1, it wont cost any epsilon, and hence overall epsilon usage will be less.

Following is an example for the same:

priv_df.count(eps = 1)  #epsilon=1 charged
priv_df.count(eps = 0.2) #epsilon not charged
priv_df.count(eps = 1) #epsilon not charged
priv_df.count(eps = 2) #epsilon=2 charged

API Reference

Resources