DescriptionMany applications of machine learning, such as human health research, involve processing private or sensitive information. Privacy concerns may impose significant hurdles to collaboration in scenarios where there are multiple sites holding data and the goal is to estimate properties jointly across all datasets. Conventional differentially private decentralized algorithms can provide strong privacy guarantees. However, the utility/accuracy of the joint estimates may be poor when the datasets at each site are small. In this work, we propose a new framework, Correlation Assisted Private Estimation (CAPE), for designing privacy-preserving decentralized algorithms with much better accuracy guarantees in an honest-but-curious model. We show that CAPE can be employed in a range of decentralized computations common in machine learning problems.
We note that matrix and tensor factorizations are key components of many decentralized processing pipelines that involve joint subspace learning. In this work, we focus on principal component analysis, independent component analysis, canonical correlation analysis and orthogonal tensor decomposition. Conventional decentralized differentially private factorization schemes suffer from excessive noise, which leads to sub-optimal subspace/feature learning. We demonstrate that the CAPE framework fits perfectly in these problems and can be employed to remedy the excessive noise issue. More specifically, we develop decentralized algorithms for these matrix and tensor factorization problems and show that, under certain conditions, these algorithms can achieve the same utility as a centralized algorithm using all datasets across sites.
Finally, we employ our CAPE framework to propose an algorithm for solving generalized optimization problems in decentralized settings. We provide a tighter characterization of the functional mechanism and propose modifications such that it can be incorporated in the CAPE framework. Our proposed decentralized functional mechanism is specifically suited for privacy-preserving computation of virtually any differentialble and continuous cost function in the decentralized setting.
For all of our proposed algorithms, we present empirical results to demonstrate that our proposed algorithms outperform existing state-of-the-art algorithms and can be competitive with non-private algorithms in many scenarios of interest. Our results indicate that meaningful privacy can be attained without losing much performance by the virtue of better algorithm design.