Privacy
YouTube search... ...Google search
- Case Studies
- Blockchain
- Bias and Variances
- Ethics
- OpenMined
- Facial recognition has to be regulated to protect the public, says AI report | Will Knight - MIT Technology Review
- Screening; Passenger, Luggage, & Cargo
- Other Challenges in Artificial Intelligence
- Data Security and Privacy | IBM Research
The availability of massive amounts of data, coupled with high-performance cloud computing platforms, has driven significant progress in artificial intelligence and, in particular, machine learning and optimization. Indeed, much scientific and technological growth in recent years, including in computer vision, natural language processing, transportation, and health, has been driven by large-scale data sets which provide a strong basis to improve existing algorithms and develop new ones. However, due to their large-scale and longitudinal collection, archiving these data sets raise significant privacy concerns. They often reveal sensitive personal information that can be exploited, without the knowledge and/or consent of the involved individuals, for various purposes including monitoring, discrimination, and illegal activities. The AAAI Workshop on Privacy-Preserving Artificial Intelligence
|
|
|
|
|
|
|
|
Contents
[hide]Privacy Preserving - Machine Learning (PPML) Techniques
Youtube search... ...Google search
- Secure and Private AI by Facebook AI | UDACITY ... introduction of three cutting-edge technologies for privacy-preserving AI: Federated Learning, Differential Privacy, and Encrypted Computation. Learn how to extend PyTorch with the tools necessary to train AI models that preserve user privacy.
- Perfectly Privacy-Preserving AI | Patricia Thaine
Many privacy-enhancing techniques concentrated on allowing multiple input parties to collaboratively train ML models without releasing their private data in its original form. This was mainly performed by utilizing cryptographic approaches, or differentially-private data release (perturbation techniques). Differential privacy is especially effective in preventing membership inference attacks. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Multiparty Computation (MPC) enables computation on data from different providers/parties, such that the other participating parties gain no additional information about each others’ inputs, except what can be learned from the public output of the algorithm. In other words, when we have the parties Alice, Bob and Casper, all three have access to the output. However, it is not possible for, e.g., Alice to know the plain data Bob and Casper provided. Secure Multiparty Computation — Enabling Privacy-Preserving Machine Learning | Florian Apfelbeck - Medium
- DeepSecure DeepSecure: Scalable Provably-Secure Deep Learning | B. Rouhani, M. S. Riazi, and F. Koushanfar
- SecureML SecureML: A System for Scalable Privacy-Preserving Machine Learning | Payman Mohassel & Yupeng Zhang
- MiniONN Oblivious Neural Network Predictions via MiniONN transformations | J. Liu, M. Juuti, Y. Lu and N. Asokan
- ABY3 ABY3: A Mixed Protocol Framework for Machine Learning | Payman Mohassel & Peter Rindal
Cryptographic Approaches
Youtube search... ...Google search
When a certain ML application requires data from multiple input parties, cryptographic protocols could be utilized to perform ML training/testing on encrypted data. In many of these techniques, achieving better efficiency involved having data owners contribute their encrypted data to the computation servers, which would reduce the problem to a secure two/three party computation setting. In addition to increased efficiency, such approaches have the benefit of not requiring the input parties to remain online. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Homomorphic Encryption
Youtube search... ...Google search
Fully homomorphic encryption enables the computation on encrypted data, with operations such as addition and multiplication that can be used as basis for more complex arbitrary functions. Due to the high cost associated with frequently bootstrapping the cipher text (refreshing the cipher text because of the accumulated noise), additive homomorphic encryption schemes were mostly used in PPML approaches. Such schemes only enable addition operations on encrypted data, and multiplication by a plain text. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Garbled Circuits
Youtube search... ...Google search
Assuming a two-party setup with Alice and Bob wanting to obtain the result of a function computed on their private inputs, Alice can convert the function into a garbled circuit, and send this circuit along with her garbled input. Bob obtains the garbled version of his input from Alice without her learning anything about Bob’s private input (e.g., using oblivious transfer). Bob can now use his garbled input with the garbled circuit to obtain the result of the required function (and can optionally share it with Alice). Some PPML approaches combined additive homomorphic encryption with Garbled circuits. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Secret Sharing
Youtube search... ...Google search
A method for distributing a secret among multiple parties, with each one holding a “share” of the secret. Individual shares are of no use on their own; however, when the shares are combined, the secret can be reconstructed. With threshold secret sharing, not all the “shares” are required to reconstruct the secret; but only “t” of them (“t” refers to threshold). In one setting, multiple input parties can generate “shares” of their private data, and send these shares to a set of non-colluding computation servers. Each server could compute a “partial result” from the “shares” it received. Finally, a results’ party (or a proxy) can receive these partial results, and combine them to find the final result. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Secure Processors
Youtube search... ...Google search
While initially introduced to ensure the confidentiality and integrity of sensitive code from unauthorized access by rogue software at higher privilege levels, Intel SGXprocessor are being utilized in privacy-preserving computation. Ohrimenko et al.14 developed a data oblivious ML algorithms for neural networks, SVM, k-means clustering, decision trees and matrix factorization that are based on SGX-processors. The main idea involves having multiple data owners collaborate to perform one of the above mentioned ML tasks with the computation party running the ML task on an SGX-enabled data center. An adversary can control all the hardware and software in the data center except for the SGX-processors used for computation. In this system, each data owner independently establishes a secure channel with the enclave (containing the code and data), authenticates themselves, verifies the integrity of the ML code in the cloud, and securely uploads its private data to the enclave. After all the data is uploaded, the ML task is run by the secure processor, and the output is sent to the results’ parties over secure authenticated channels. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Perturbation Approaches
Youtube search... ...Google search
Differential privacy (DP) techniques resist membership inference attacks by adding random noise to the input data, to iterations in a certain algorithm, or to the algorithm output. While most DP approaches assume a trusted aggregator of the data, local differential privacy allows each input party to add the noise locally; thus, requiring no trusted server. Finally, dimensionally reduction perturbs the data by projecting it to a lower dimensional hyperplane to prevent reconstructing the original data, and/or to restrict inference of sensitive information. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Differential Privacy (DP)
Youtube search... ...Google search
Differential privacy is a powerful tool for quantifying and solving practical problems related to privacy. Its flexible definition gives it the potential to be applied in a wide range of applications, including Machine Learning applications. Understanding Differential Privacy - From Intuitions behind a Theory to a Private AI Application | An Nguyen - Towards Data Science
Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks. Differential privacy was developed by cryptographers and thus is often associated with cryptography, and draws much of its language from cryptography. Wikipedia
Local Differential Privacy
Youtube search... ...Google search
When the input parties do not have enough information to train a ML model, it might be better to utilize approaches that rely on local differential privacy (LDP). With LDP, each input party would perturb their data, and only release this obscure view of the data. An old, and well-known version of local privacy is randomized response (Warner 1965), which provided plausible deniability for respondents to sensitive queries. For example, a respondent would flip a fair coin: (a) if “tails”, the respondent answers truthfully, and (b) if “heads”, then flip a second coin, and respond “Yes” if heads, and “No” if tails. RAPPOR 22 is a technology for crowdsourcing statistics from end-user client software by applying RR to Bloom filters with strong 𝜀-DP guarantees. RAPPOR is deployed in Google Chrome web browser, and it permits collecting statistics on client-side values and strings, such as their categories, frequencies, and histograms. By performing RR twice with a memoization step in between, privacy protection is maintained even when multiple responses are collected from the same participant over time. A ML oriented work, AnonML23, utilized the ideas of RR for generating histograms from multiple input parties. AnonML utilizes these histograms to generate synthetic data on which a ML model can be trained. Like other local DP approaches, AnonML is a good option when no input party has enough data to build a ML model on their own (and there is no trusted aggregator). Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Dimensionality Reduction (DR)
Youtube search... ...Google search
perturbs the data by projecting it to a lower dimensional hyperplane. Such transformation is lossy, and it was suggested by Liu et al.24 that it would enhance the privacy, since retrieving the exact original data from a reduced dimension version would not be possible (the possible solutions are infinite as the number of equations is less than the number of unknowns). Hence, Liu et al.24 proposed to use a random matrix to reduce the dimensions of the input data. Since a random matrix might decrease the utility, other approaches used both unsupervised and supervised DR techniques such as principal component analysis (PCA), discriminant component analysis (DCA), and multidimensional scaling (MDS). These approaches try to find the best Projection matrix for utility purposes, while relying on the reduced dimensionality aspect to enhance the privacy. Since an approximation of the original data can still be obtained from the reduced dimensions, some approaches, e.g. Jiang et al.25, combined dimensionality reduction with DP to achieve differentially-private data publishing. While some entities might seek total hiding of their data, DR has another benefit for privacy. For datasets that have samples with two labels: a utility label and a privacy label, Kung26 proposes a DR method to enable the data owner to project her data in a way that enables maximizing the accuracy of learning for the utility labels, while decreasing the accuracy for learning the privacy labels. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
General Data Protection Regulations (GDPR)
YouTube search... ...Google search
|
|
|
|
|
|