Privacy

From
Revision as of 23:01, 26 September 2020 by BPeat (talk | contribs)
Jump to: navigation, search

YouTube search... ...Google search

The availability of massive amounts of data, coupled with high-performance cloud computing platforms, has driven significant progress in artificial intelligence and, in particular, machine learning and optimization. Indeed, much scientific and technological growth in recent years, including in computer vision, natural language processing, transportation, and health, has been driven by large-scale data sets which provide a strong basis to improve existing algorithms and develop new ones. However, due to their large-scale and longitudinal collection, archiving these data sets raise significant privacy concerns. They often reveal sensitive personal information that can be exploited, without the knowledge and/or consent of the involved individuals, for various purposes including monitoring, discrimination, and illegal activities. The AAAI Workshop on Privacy-Preserving Artificial Intelligence

Privacy Preserving - Machine Learning (PPML) Techniques

Youtube search... ...Google search

Many privacy-enhancing techniques concentrated on allowing multiple input parties to collaboratively train ML models without releasing their private data in its original form. This was mainly performed by utilizing cryptographic approaches, or differentially-private data release (perturbation techniques). Differential privacy is especially effective in preventing membership inference attacks. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Multiparty Computation (MPC) enables computation on data from different providers/parties, such that the other participating parties gain no additional information about each others’ inputs, except what can be learned from the public output of the algorithm. In other words, when we have the parties Alice, Bob and Casper, all three have access to the output. However, it is not possible for, e.g., Alice to know the plain data Bob and Casper provided. Secure Multiparty Computation — Enabling Privacy-Preserving Machine Learning | Florian Apfelbeck - Medium

Cryptographic Approaches

Youtube search... ...Google search

When a certain ML application requires data from multiple input parties, cryptographic protocols could be utilized to perform ML training/testing on encrypted data. In many of these techniques, achieving better efficiency involved having data owners contribute their encrypted data to the computation servers, which would reduce the problem to a secure two/three party computation setting. In addition to increased efficiency, such approaches have the benefit of not requiring the input parties to remain online. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Homomorphic Encryption

Youtube search... ...Google search

Fully homomorphic encryption enables the computation on encrypted data, with operations such as addition and multiplication that can be used as basis for more complex arbitrary functions. Due to the high cost associated with frequently bootstrapping the cipher text (refreshing the cipher text because of the accumulated noise), additive homomorphic encryption schemes were mostly used in PPML approaches. Such schemes only enable addition operations on encrypted data, and multiplication by a plain text. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Garbled Circuits

Youtube search... ...Google search

Assuming a two-party setup with Alice and Bob wanting to obtain the result of a function computed on their private inputs, Alice can convert the function into a garbled circuit, and send this circuit along with her garbled input. Bob obtains the garbled version of his input from Alice without her learning anything about Bob’s private input (e.g., using oblivious transfer). Bob can now use his garbled input with the garbled circuit to obtain the result of the required function (and can optionally share it with Alice). Some PPML approaches combined additive homomorphic encryption with Garbled circuits. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Secret Sharing

Youtube search... ...Google search

A method for distributing a secret among multiple parties, with each one holding a “share” of the secret. Individual shares are of no use on their own; however, when the shares are combined, the secret can be reconstructed. With threshold secret sharing, not all the “shares” are required to reconstruct the secret; but only “t” of them (“t” refers to threshold). In one setting, multiple input parties can generate “shares” of their private data, and send these shares to a set of non-colluding computation servers. Each server could compute a “partial result” from the “shares” it received. Finally, a results’ party (or a proxy) can receive these partial results, and combine them to find the final result. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Secure Processors

Youtube search... ...Google search

While initially introduced to ensure the confidentiality and integrity of sensitive code from unauthorized access by rogue software at higher privilege levels, Intel SGXprocessor are being utilized in privacy-preserving computation. Ohrimenko et al.14 developed a data oblivious ML algorithms for neural networks, SVM, k-means clustering, decision trees and matrix factorization that are based on SGX-processors. The main idea involves having multiple data owners collaborate to perform one of the above mentioned ML tasks with the computation party running the ML task on an SGX-enabled data center. An adversary can control all the hardware and software in the data center except for the SGX-processors used for computation. In this system, each data owner independently establishes a secure channel with the enclave (containing the code and data), authenticates themselves, verifies the integrity of the ML code in the cloud, and securely uploads its private data to the enclave. After all the data is uploaded, the ML task is run by the secure processor, and the output is sent to the results’ parties over secure authenticated channels. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Perturbation Approaches

Youtube search... ...Google search

Differential privacy (DP) techniques resist membership inference attacks by adding random noise to the input data, to iterations in a certain algorithm, or to the algorithm output. While most DP approaches assume a trusted aggregator of the data, local differential privacy allows each input party to add the noise locally; thus, requiring no trusted server. Finally, dimensionally reduction perturbs the data by projecting it to a lower dimensional hyperplane to prevent reconstructing the original data, and/or to restrict inference of sensitive information. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Differential Privacy (DP)

Youtube search... ...Google search

Differential privacy is a powerful tool for quantifying and solving practical problems related to privacy. Its flexible definition gives it the potential to be applied in a wide range of applications, including Machine Learning applications. Understanding Differential Privacy - From Intuitions behind a Theory to a Private AI Application | An Nguyen - Towards Data Science

Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks. Differential privacy was developed by cryptographers and thus is often associated with cryptography, and draws much of its language from cryptography. Wikipedia

Local Differential Privacy

Youtube search... ...Google search

When the input parties do not have enough information to train a ML model, it might be better to utilize approaches that rely on local differential privacy (LDP). With LDP, each input party would perturb their data, and only release this obscure view of the data. An old, and well-known version of local privacy is randomized response (Warner 1965), which provided plausible deniability for respondents to sensitive queries. For example, a respondent would flip a fair coin: (a) if “tails”, the respondent answers truthfully, and (b) if “heads”, then flip a second coin, and respond “Yes” if heads, and “No” if tails. RAPPOR 22 is a technology for crowdsourcing statistics from end-user client software by applying RR to Bloom filters with strong 𝜀-DP guarantees. RAPPOR is deployed in Google Chrome web browser, and it permits collecting statistics on client-side values and strings, such as their categories, frequencies, and histograms. By performing RR twice with a memoization step in between, privacy protection is maintained even when multiple responses are collected from the same participant over time. A ML oriented work, AnonML23, utilized the ideas of RR for generating histograms from multiple input parties. AnonML utilizes these histograms to generate synthetic data on which a ML model can be trained. Like other local DP approaches, AnonML is a good option when no input party has enough data to build a ML model on their own (and there is no trusted aggregator). Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

Dimensionality Reduction (DR)

Youtube search... ...Google search

perturbs the data by projecting it to a lower dimensional hyperplane. Such transformation is lossy, and it was suggested by Liu et al.24 that it would enhance the privacy, since retrieving the exact original data from a reduced dimension version would not be possible (the possible solutions are infinite as the number of equations is less than the number of unknowns). Hence, Liu et al.24 proposed to use a random matrix to reduce the dimensions of the input data. Since a random matrix might decrease the utility, other approaches used both unsupervised and supervised DR techniques such as principal component analysis (PCA), discriminant component analysis (DCA), and multidimensional scaling (MDS). These approaches try to find the best projection matrix for utility purposes, while relying on the reduced dimensionality aspect to enhance the privacy. Since an approximation of the original data can still be obtained from the reduced dimensions, some approaches, e.g. Jiang et al.25, combined dimensionality reduction with DP to achieve differentially-private data publishing. While some entities might seek total hiding of their data, DR has another benefit for privacy. For datasets that have samples with two labels: a utility label and a privacy label, Kung26 proposes a DR method to enable the data owner to project her data in a way that enables maximizing the accuracy of learning for the utility labels, while decreasing the accuracy for learning the privacy labels. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University

General Data Protection Regulations (GDPR)

YouTube search... ...Google search