Difference between revisions of "Privacy"
m |
m |
||
(28 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
|title=PRIMO.ai | |title=PRIMO.ai | ||
|titlemode=append | |titlemode=append | ||
− | |keywords=artificial, intelligence, machine, learning, models | + | |keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools |
− | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | + | |
+ | <!-- Google tag (gtag.js) --> | ||
+ | <script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script> | ||
+ | <script> | ||
+ | window.dataLayer = window.dataLayer || []; | ||
+ | function gtag(){dataLayer.push(arguments);} | ||
+ | gtag('js', new Date()); | ||
+ | |||
+ | gtag('config', 'G-4GCWLBVJ7T'); | ||
+ | </script> | ||
}} | }} | ||
− | [ | + | [https://www.youtube.com/results?search_query=privacy+machine+learning+artificial+intelligence YouTube search...] |
− | [ | + | [https://www.google.com/search?q=privacy+machine+learning+artificial+intelligence ...Google search] |
− | * [[ | + | * [[Risk, Compliance and Regulation]] ... [[Ethics]] ... [[Privacy]] ... [[Law]] ... [[AI Governance]] ... [[AI Verification and Validation]] |
− | * | + | * [[Cybersecurity]] ... [[Open-Source Intelligence - OSINT |OSINT]] ... [[Cybersecurity Frameworks, Architectures & Roadmaps | Frameworks]] ... [[Cybersecurity References|References]] ... [[Offense - Adversarial Threats/Attacks| Offense]] ... [[National Institute of Standards and Technology (NIST)|NIST]] ... [[U.S. Department of Homeland Security (DHS)| DHS]] ... [[Screening; Passenger, Luggage, & Cargo|Screening]] ... [[Law Enforcement]] ... [[Government Services|Government]] ... [[Defense]] ... [[Joint Capabilities Integration and Development System (JCIDS)#Cybersecurity & Acquisition Lifecycle Integration| Lifecycle Integration]] ... [[Cybersecurity Companies/Products|Products]] ... [[Cybersecurity: Evaluating & Selling|Evaluating]] |
− | + | * [[Policy]] ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]] | |
+ | * [[Blockchain]] | ||
+ | * [[Data Science]] ... [[Data Governance|Governance]] ... [[Data Preprocessing|Preprocessing]] ... [[Feature Exploration/Learning|Exploration]] ... [[Data Interoperability|Interoperability]] ... [[Algorithm Administration#Master Data Management (MDM)|Master Data Management (MDM)]] ... [[Bias and Variances]] ... [[Benchmarks]] ... [[Datasets]] | ||
* [[OpenMined]] | * [[OpenMined]] | ||
− | * [ | + | * [https://www.technologyreview.com/s/612552/facial-recognition-has-to-be-regulated-to-protect-the-public-says-ai-report Facial recognition has to be regulated to protect the public, says AI report | Will Knight - MIT Technology Review] |
− | |||
* [[Other Challenges]] in Artificial Intelligence | * [[Other Challenges]] in Artificial Intelligence | ||
+ | * [https://www.research.ibm.com/haifa/projects/imt/AI%20Privacy/index.shtml Data Security and Privacy | ][[IBM|IBM Research]] | ||
− | The availability of massive amounts of data, coupled with high-performance cloud computing platforms, has driven significant progress in artificial intelligence and, in particular, machine learning and optimization. Indeed, much scientific and technological growth in recent years, including in computer vision, natural language processing, transportation, and health, has been driven by large-scale data sets which provide a strong basis to improve existing algorithms and develop new ones. However, due to their large-scale and longitudinal collection, archiving these data sets raise significant privacy concerns. They often reveal sensitive personal information that can be exploited, without the knowledge and/or consent of the involved individuals, for various purposes including monitoring, discrimination, and illegal activities. [ | + | The availability of massive amounts of data, coupled with high-performance cloud computing platforms, has driven significant progress in artificial intelligence and, in particular, machine learning and optimization. Indeed, much scientific and technological growth in recent years, including in computer vision, natural language processing, transportation, and health, has been driven by large-scale data sets which provide a strong basis to improve existing algorithms and develop new ones. However, due to their large-scale and longitudinal collection, archiving these data sets raise significant privacy concerns. They often reveal sensitive personal information that can be exploited, without the knowledge and/or consent of the involved individuals, for various purposes including monitoring, discrimination, and illegal activities. [https://www2.isye.gatech.edu/~fferdinando3/cfp/PPAI20/ The AAAI Workshop on Privacy-Preserving Artificial Intelligence] |
{|<!-- T --> | {|<!-- T --> | ||
Line 23: | Line 34: | ||
|| | || | ||
<youtube>7zbNu4tFEtw</youtube> | <youtube>7zbNu4tFEtw</youtube> | ||
− | <b> | + | <b>The Security and Privacy Implications of AI and Machine Learning (SHA2017) |
− | </b><br> | + | </b><br>What will the recent rapid progress in machine learning and AI mean for the fields of computer security and privacy? This talk gives a tour of some answers, and some unanswered questions. It will discuss new types of attacks and surveillance that are becoming possible due with modern neural networks, and some new research problems that the computer security community should be working on. #MachineLearning #Privacy |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
Line 31: | Line 42: | ||
|| | || | ||
<youtube>DnzS2ht_ZtI</youtube> | <youtube>DnzS2ht_ZtI</youtube> | ||
− | <b> | + | <b>AWS re:Invent 2018: Data Privacy & Governance in the Age of Big Data (GPSTEC303) |
− | </b><br> | + | </b><br>Come to this session to learn a new approach in reducing risk and costs while increasing productivity, organizational alacrity, and customer experience, resulting in a competitive advantage and assorted revenue growth. We share how a de-identified data lake on [[Amazon]] AWS can help you comply with [[Privacy#General Data Protection Regulations (GDPR)|General Data Protection Regulation (GDPR)]] and California Consumer Protection Act requirements by solving the issue at its causal element. Complete Title: AWS re:Invent 2018: Data Privacy & Governance in the Age of Big Data: Deploy a De-Identified Data Lake (GPSTEC303) |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
Line 40: | Line 51: | ||
|| | || | ||
<youtube>yG4JL0ZRmi4</youtube> | <youtube>yG4JL0ZRmi4</youtube> | ||
− | <b> | + | <b>The coming privacy crisis on the Internet of Things | Alasdair Allan | TEDxExeterSalon |
− | </b><br> | + | </b><br>Mark Zuckerberg may have declared that privacy is no longer a social norm, but Alasdair Allan believes a backlash is coming. In this talk, he explores how the internet has changed the concept of ownership and explains why the Internet of Things – combined with the EU’s General Data Protection Regulation – could soon determine whether we have any privacy at all. Alasdair is a scientist, author, hacker and journalist. --- |
+ | TEDxExeterSalon: From driverless cars to diagnosis of medical imaging, artificial intelligence is being heralded as the next industrial revolution. But how does AI relate to us in all our glorious complex humanity? Our first TEDxExeterSalon explored the ways in which we’ll interact with algorithmic intelligence in the near future. TEDxExeter: Now in our 7th year, TEDxExeter is Exeter’s Ideas Festival with global reach, licensed by TED and organised by volunteers who are passionate about spreading great ideas in our community. More information: https://www.tedxexeter.com | ||
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
Line 47: | Line 59: | ||
{| class="wikitable" style="width: 550px;" | {| class="wikitable" style="width: 550px;" | ||
|| | || | ||
− | <youtube> | + | <youtube>4zrU54VIK6k</youtube> |
− | <b> | + | <b>Privacy Preserving AI (Andrew Trask) | MIT Deep Learning Series |
− | </b><br> | + | </b><br>Lecture by [[Creatives#Andrew Trask|Andrew Trask]] in January 2020, part of the MIT Deep Learning Lecture Series. Website: https://deeplearning.mit.edu Slides: https://bit.ly/38jzide Playlist: https://bit.ly/deep-learning-playlist LINKS: |
+ | Andrew Twitter: https://twitter.com/iamtrask [[OpenMined]]: https://www.openmined.org/ Grokking Deep Learning (book): https://bit.ly/2RsxlUZ | ||
+ | OUTLINE: 0:00 - Introduction 0:54 - Privacy preserving AI talk overview 1:28 - Key question: Is it possible to answer questions using data we cannot see? 5:56 - Tool 1: remote execution 8:44 - Tool 2: search and example data 11:35 - Tool 3: differential privacy 28:09 - Tool 4: secure multi-party computation 36:37 - Federated learning 39:55 - AI, privacy, and society 46:23 - Open data for science 50:35 - Single-use accountability 54:29 - End-to-end encrypted services 59:51 - Q&A: privacy of the diagnosis 1:02:49 - Q&A: removing bias from data when data is encrypted 1:03:40 - Q&A: regulation of privacy 1:04:27 - Q&A: [[OpenMined]] 1:06:16 - Q&A: encryption and nonlinear functions 1:07:53 - Q&A: path to adoption of privacy-preserving technology 1:11:44 - Q&A: recommendation systems | ||
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
Line 57: | Line 71: | ||
|| | || | ||
<youtube>EYRdIwhTDWU</youtube> | <youtube>EYRdIwhTDWU</youtube> | ||
− | <b> | + | <b>Big data, artificial intelligence, machine learning and data protection |
− | </b><br> | + | </b><br>In July 2017, the Office of the Australian Information Commissioner hosted a national conference entitled 'Data + Privacy Asia Pacific' at the ICC Sydney. Plenary 3: Big data, artificial intelligence, machine learning and data protection |
+ | Presentation by Simon Entwisle, Deputy Commissioner (Operations), Information Commissioner’s Office UK | ||
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
Line 65: | Line 80: | ||
|| | || | ||
<youtube>X9wJu8bzXLY</youtube> | <youtube>X9wJu8bzXLY</youtube> | ||
− | <b> | + | <b>How to Engineer Privacy Rights in the World of Artificial Intelligence |
− | </b><br> | + | </b><br>RSA Conference Jeewon Serrato, Counsel, Global Head of Privacy & Data Protection Practice, Shearman & Sterling LLP Steven Lee, Managing Director, Alvarez & Marsal John Leon, Co-Founder, President, CTO, ORock Technologies Do human rights attach to artificial intelligence? With the EU General Data Protection Regulation going into effect on May 25, 2018, this panel will discuss what kinds of data subject rights European residents will have to object to and inquire about personal data that is being collected and processed by AI and what it means to be a privacy engineer in meeting these compliance challenges. Learning Objectives: 1: Understand how human rights apply to the AI world. 2: Know the legal requirements engineers should be building into AI products. 3: Explore a compliance checklist that can help mitigate risk to the company. |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
Line 74: | Line 89: | ||
|| | || | ||
<youtube>ypykT4tqIjc</youtube> | <youtube>ypykT4tqIjc</youtube> | ||
− | <b> | + | <b>An Introduction to Private Machine Learning - Singapore [[Python]] User Group |
− | </b><br> | + | </b><br>Speaker: Satish Shankar This talk will introduce the essential concepts from cryptography necessary to build AI systems that use sensitive data and yet protect our privacy. Specifically, we will cover concepts from secure multi-party computation (MPC) and how they can be used to build machine learning algorithms. Why does this matter? This matters because we as a society are struggling to balance the benefits of data driven systems and the privacy risks they create. Building any [[Machine Learning (ML)]] or analytics model necessitates the collection of data. If this data is sensitive or personal, it inevitably turns into an honeypot for hackers. At a societal level, we are responding to this issue by introducing more regulation such as the GDPR. Instead of regulations, it is possible to use cryptography to protect our data and still analyze it: This talk show how. About: Shankar leads the machine learning and AI efforts for Manulife’s innovation labs. He works on quantitative investment and insurance, drawing on a wide range of fields from machine learning, natural language processing, differential privacy, encryption, and more. He is particularly interested in the intersection of [[blockchains]], distributed systems and privacy in machine learning. |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
Line 82: | Line 97: | ||
|| | || | ||
<youtube>39hNjnhY7cY</youtube> | <youtube>39hNjnhY7cY</youtube> | ||
− | <b> | + | <b>Privacy in Data Science |
− | </b><br> | + | </b><br>[[Creatives#Siraj Raval|Siraj Raval]] Learning the tools that preserve user privacy is going to become an increasingly important skillset for all aspiring data scientists to learn in the coming months. Legal frameworks like GDPR are being proposed all around the world as people realize how valuable their data is, so data scientists need to accept that they'll have to handle data differently than in the past. In this video, I'll demo 3 important privacy techniques; differential privacy, secure multi party computation, and federated learning. We'll use these techniques to train a mode built with Python l to predict diabetes while keep user data anonymous. Enjoy! [https://github.com/OpenMined/PySyft/tree/master/examples/tutorials Code for this video] Please Subscribe! And Like. And comment. Thats what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval instagram: https://www.instagram.com/sirajraval |
+ | [[Meta|Facebook]]: https://www.facebook.com/sirajology More learning resources: https://www.openmined.org/ https://mortendahl.github.io/ | ||
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
== Privacy Preserving - Machine Learning (PPML) Techniques == | == Privacy Preserving - Machine Learning (PPML) Techniques == | ||
− | [ | + | [https://www.youtube.com/results?search_query=Secure+Multiparty+Computation+Privacy+Preserving+Machine+Learning+PPML+Techniques Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Secure+Multiparty+Computation+Privacy+Preserving+Machine+Learning+PPML+Techniques ...Google search] |
− | * [ | + | * [https://www.udacity.com/course/secure-and-private-ai--ud185 Secure and Private AI by] [[Meta|Facebook]] AI | UDACITY ... introduction of three cutting-edge technologies for privacy-preserving AI: Federated Learning, Differential Privacy, and Encrypted Computation. Learn how to extend PyTorch with the tools necessary to train AI models that preserve user privacy. |
− | * [ | + | * [https://towardsdatascience.com/perfectly-privacy-preserving-ai-c14698f322f5 Perfectly Privacy-Preserving AI | Patricia Thaine] |
− | Many privacy-enhancing techniques concentrated on allowing multiple input parties to collaboratively train ML models without releasing their private data in its original form. This was mainly performed by utilizing cryptographic approaches, or differentially-private data release (perturbation techniques). Differential privacy is especially effective in preventing membership inference attacks. [ | + | Many privacy-enhancing techniques concentrated on allowing multiple input parties to collaboratively train ML models without releasing their private data in its original form. This was mainly performed by utilizing cryptographic approaches, or differentially-private data release (perturbation techniques). Differential privacy is especially effective in preventing membership inference attacks. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
− | Multiparty Computation (MPC) enables computation on data from different providers/parties, such that the other participating parties gain no additional information about each others’ inputs, except what can be learned from the public output of the algorithm. In other words, when we have the parties Alice, Bob and Casper, all three have access to the output. However, it is not possible for, e.g., Alice to know the plain data Bob and Casper provided. [ | + | Multiparty Computation (MPC) enables computation on data from different providers/parties, such that the other participating parties gain no additional information about each others’ inputs, except what can be learned from the public output of the algorithm. In other words, when we have the parties Alice, Bob and Casper, all three have access to the output. However, it is not possible for, e.g., Alice to know the plain data Bob and Casper provided. [https://medium.com/@apfelbeck.florian/secure-multiparty-computation-enabling-privacy-preserving-machine-learning-ffef396b8ca2 Secure Multiparty Computation — Enabling Privacy-Preserving Machine Learning | Florian Apfelbeck - Medium] |
− | * DeepSecure [ | + | * DeepSecure [https://arxiv.org/abs/1705.08963 DeepSecure: Scalable Provably-Secure Deep Learning | B. Rouhani, M. S. Riazi, and F. Koushanfar] |
− | * SecureML [ | + | * SecureML [https://eprint.iacr.org/2017/396.pdf SecureML: A System for Scalable Privacy-Preserving Machine Learning | Payman Mohassel & Yupeng Zhang] |
− | * MiniONN [ | + | * MiniONN [https://eprint.iacr.org/2017/452.pdf Oblivious Neural Network Predictions via MiniONN transformations | J. Liu, M. Juuti, Y. Lu and N. Asokan] |
− | * ABY3 [ | + | * ABY3 [https://dl.acm.org/doi/10.1145/3243734.3243760 ABY3: A Mixed Protocol Framework for Machine Learning | Payman Mohassel & Peter Rindal] |
=== Cryptographic Approaches === | === Cryptographic Approaches === | ||
− | [ | + | [https://www.youtube.com/results?search_query=Cryptographic+Approaches+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Cryptographic+Approaches+machine+learning+ML ...Google search] |
− | When a certain ML application requires data from multiple input parties, cryptographic protocols could be utilized to perform ML training/testing on encrypted data. In many of these techniques, achieving better efficiency involved having data owners contribute their encrypted data to the computation servers, which would reduce the problem to a secure two/three party computation setting. In addition to increased efficiency, such approaches have the benefit of not requiring the input parties to remain online. [ | + | When a certain ML application requires data from multiple input parties, cryptographic protocols could be utilized to perform ML training/testing on encrypted data. In many of these techniques, achieving better efficiency involved having data owners contribute their encrypted data to the computation servers, which would reduce the problem to a secure two/three party computation setting. In addition to increased efficiency, such approaches have the benefit of not requiring the input parties to remain online. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
==== Homomorphic Encryption ==== | ==== Homomorphic Encryption ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Homomorphic+Encryption+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Homomorphic+Encryption+machine+learning+ML ...Google search] |
− | Fully homomorphic encryption enables the computation on encrypted data, with operations such as addition and multiplication that can be used as basis for more complex arbitrary functions. Due to the high cost associated with frequently bootstrapping the cipher text (refreshing the cipher text because of the accumulated noise), additive homomorphic encryption schemes were mostly used in PPML approaches. Such schemes only enable addition operations on encrypted data, and multiplication by a plain text. [ | + | Fully homomorphic encryption enables the computation on encrypted data, with operations such as addition and multiplication that can be used as basis for more complex arbitrary functions. Due to the high cost associated with frequently bootstrapping the cipher text (refreshing the cipher text because of the accumulated noise), additive homomorphic encryption schemes were mostly used in PPML approaches. Such schemes only enable addition operations on encrypted data, and multiplication by a plain text. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
==== Garbled Circuits ==== | ==== Garbled Circuits ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Garbled+Circuits+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Garbled+Circuits+machine+learning+ML ...Google search] |
− | Assuming a two-party setup with Alice and Bob wanting to obtain the result of a function computed on their private inputs, Alice can convert the function into a garbled circuit, and send this circuit along with her garbled input. Bob obtains the garbled version of his input from Alice without her learning anything about Bob’s private input (e.g., using oblivious transfer). Bob can now use his garbled input with the garbled circuit to obtain the result of the required function (and can optionally share it with Alice). Some PPML approaches combined additive homomorphic encryption with Garbled circuits. [ | + | Assuming a two-party setup with Alice and Bob wanting to obtain the result of a function computed on their private inputs, Alice can convert the function into a garbled circuit, and send this circuit along with her garbled input. Bob obtains the garbled version of his input from Alice without her learning anything about Bob’s private input (e.g., using oblivious transfer). Bob can now use his garbled input with the garbled circuit to obtain the result of the required function (and can optionally share it with Alice). Some PPML approaches combined additive homomorphic encryption with Garbled circuits. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
==== Secret Sharing ==== | ==== Secret Sharing ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Secret+Sharing+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Secret+Sharing+machine+learning+ML ...Google search] |
A method for distributing a secret among multiple parties, with each one holding a “share” of the secret. Individual shares are of no use on their own; however, when the shares are combined, the secret can be reconstructed. With threshold secret sharing, not all the “shares” are required to reconstruct the secret; but only “t” of them (“t” refers to threshold). | A method for distributing a secret among multiple parties, with each one holding a “share” of the secret. Individual shares are of no use on their own; however, when the shares are combined, the secret can be reconstructed. With threshold secret sharing, not all the “shares” are required to reconstruct the secret; but only “t” of them (“t” refers to threshold). | ||
− | In one setting, multiple input parties can generate “shares” of their private data, and send these shares to a set of non-colluding computation servers. Each server could compute a “partial result” from the “shares” it received. Finally, a results’ party (or a proxy) can receive these partial results, and combine them to find the final result. [ | + | In one setting, multiple input parties can generate “shares” of their private data, and send these shares to a set of non-colluding computation servers. Each server could compute a “partial result” from the “shares” it received. Finally, a results’ party (or a proxy) can receive these partial results, and combine them to find the final result. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
==== Secure Processors ==== | ==== Secure Processors ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Secure+Processors+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Secure+Processors+machine+learning+ML ...Google search] |
− | While initially introduced to ensure the confidentiality and integrity of sensitive code from unauthorized access by rogue software at higher privilege levels, Intel SGXprocessor are being utilized in privacy-preserving computation. Ohrimenko et al.14 developed a data oblivious ML algorithms for neural networks, SVM, k-means clustering, decision trees and matrix factorization that are based on SGX-processors. The main idea involves having multiple data owners collaborate to perform one of the above mentioned ML tasks with the computation party running the ML task on an SGX-enabled data center. An adversary can control all the hardware and software in the data center except for the SGX-processors used for computation. In this system, each data owner independently establishes a secure channel with the enclave (containing the code and data), authenticates themselves, verifies the integrity of the ML code in the cloud, and securely uploads its private data to the enclave. After all the data is uploaded, the ML task is run by the secure processor, and the output is sent to the results’ parties over secure authenticated channels. [ | + | While initially introduced to ensure the confidentiality and integrity of sensitive code from unauthorized access by rogue software at higher privilege levels, Intel SGXprocessor are being utilized in privacy-preserving computation. Ohrimenko et al.14 developed a data oblivious ML algorithms for neural networks, SVM, k-means clustering, decision trees and matrix factorization that are based on SGX-processors. The main idea involves having multiple data owners collaborate to perform one of the above mentioned ML tasks with the computation party running the ML task on an SGX-enabled data center. An adversary can control all the hardware and software in the data center except for the SGX-processors used for computation. In this system, each data owner independently establishes a secure channel with the enclave (containing the code and data), authenticates themselves, verifies the integrity of the ML code in the cloud, and securely uploads its private data to the enclave. After all the data is uploaded, the ML task is run by the secure processor, and the output is sent to the results’ parties over secure authenticated channels. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
=== Perturbation Approaches === | === Perturbation Approaches === | ||
− | [ | + | [https://www.youtube.com/results?search_query=Perturbation+Approaches+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Perturbation+Approaches+machine+learning+ML ...Google search] |
− | Differential privacy (DP) techniques resist membership inference attacks by adding random noise to the input data, to iterations in a certain algorithm, or to the algorithm output. While most DP approaches assume a trusted aggregator of the data, local differential privacy allows each input party to add the noise locally; thus, requiring no trusted server. Finally, dimensionally reduction perturbs the data by projecting it to a lower dimensional hyperplane to prevent reconstructing the original data, and/or to restrict inference of sensitive information. [ | + | Differential privacy (DP) techniques resist membership inference attacks by adding random noise to the input data, to iterations in a certain algorithm, or to the algorithm output. While most DP approaches assume a trusted aggregator of the data, local differential privacy allows each input party to add the noise locally; thus, requiring no trusted server. Finally, dimensionally reduction perturbs the data by projecting it to a lower dimensional hyperplane to prevent reconstructing the original data, and/or to restrict inference of sensitive information. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
==== Differential Privacy (DP) ==== | ==== Differential Privacy (DP) ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Differential+Privacy+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Differential+Privacy+machine+learning+ML ...Google search] |
− | Differential privacy is a powerful tool for quantifying and solving practical problems related to privacy. Its flexible definition gives it the potential to be applied in a wide range of applications, including Machine Learning applications. [ | + | Differential privacy is a powerful tool for quantifying and solving practical problems related to privacy. Its flexible definition gives it the potential to be applied in a wide range of applications, including Machine Learning applications. [https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a Understanding Differential Privacy - From Intuitions behind a Theory to a Private AI Application | An Nguyen - Towards Data Science] |
− | Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks. Differential privacy was developed by cryptographers and thus is often associated with cryptography, and draws much of its language from cryptography. [ | + | Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation. Differential privacy is often discussed in the [[context]] of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks. Differential privacy was developed by cryptographers and thus is often associated with cryptography, and draws much of its language from cryptography. [https://en.wikipedia.org/wiki/Differential_privacy Wikipedia] |
==== Local Differential Privacy ==== | ==== Local Differential Privacy ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Local+Differential+Privacy+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Local+Differential+Privacy+machine+learning+ML ...Google search] |
− | When the input parties do not have enough information to train a ML model, it might be better to utilize approaches that rely on local differential privacy (LDP). With LDP, each input party would perturb their data, and only release this obscure view of the data. An old, and well-known version of local privacy is randomized response (Warner 1965), which provided plausible deniability for respondents to sensitive queries. For example, a respondent would flip a fair coin: (a) if “tails”, the respondent answers truthfully, and (b) if “heads”, then flip a second coin, and respond “Yes” if heads, and “No” if tails. RAPPOR 22 is a technology for crowdsourcing statistics from end-user client software by applying RR to Bloom filters with strong 𝜀-DP guarantees. RAPPOR is deployed in Google Chrome web browser, and it permits collecting statistics on client-side values and strings, such as their categories, frequencies, and histograms. By performing RR twice with a memoization step in between, privacy protection is maintained even when multiple responses are collected from the same participant over time. A ML oriented work, AnonML23, utilized the ideas of RR for generating histograms from multiple input parties. AnonML utilizes these histograms to generate synthetic data on which a ML model can be trained. Like other local DP approaches, AnonML is a good option when no input party has enough data to build a ML model on their own (and there is no trusted aggregator). [ | + | When the input parties do not have enough information to train a ML model, it might be better to utilize approaches that rely on local differential privacy (LDP). With LDP, each input party would perturb their data, and only release this obscure view of the data. An old, and well-known version of local privacy is randomized response (Warner 1965), which provided plausible deniability for respondents to sensitive queries. For example, a respondent would flip a fair coin: (a) if “tails”, the respondent answers truthfully, and (b) if “heads”, then flip a second coin, and respond “Yes” if heads, and “No” if tails. RAPPOR 22 is a technology for crowdsourcing statistics from end-user client software by applying RR to Bloom filters with strong 𝜀-DP guarantees. RAPPOR is deployed in Google Chrome web browser, and it permits collecting statistics on client-side values and strings, such as their categories, frequencies, and histograms. By performing RR twice with a memoization step in between, privacy protection is maintained even when multiple responses are collected from the same participant over time. A ML oriented work, AnonML23, utilized the ideas of RR for generating histograms from multiple input parties. AnonML utilizes these histograms to generate synthetic data on which a ML model can be trained. Like other local DP approaches, AnonML is a good option when no input party has enough data to build a ML model on their own (and there is no trusted aggregator). [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
==== Dimensionality Reduction (DR) ==== | ==== Dimensionality Reduction (DR) ==== | ||
− | [ | + | [https://www.youtube.com/results?search_query=Dimensionality+Reduction+DR+machine+learning+ML Youtube search...] |
− | [ | + | [https://www.google.com/search?q=Dimensionality+Reduction+DR+machine+learning+ML ...Google search] |
− | perturbs the data by projecting it to a lower dimensional hyperplane. Such transformation is lossy, and it was suggested by Liu et al.24 that it would enhance the privacy, since retrieving the exact original data from a reduced dimension version would not be possible (the possible solutions are infinite as the number of equations is less than the number of unknowns). Hence, Liu et al.24 proposed to use a random matrix to reduce the dimensions of the input data. Since a random matrix might decrease the utility, other approaches used both unsupervised and supervised DR techniques such as principal component analysis (PCA), discriminant component analysis (DCA), and multidimensional scaling (MDS). These approaches try to find the best | + | perturbs the data by projecting it to a lower dimensional hyperplane. Such transformation is lossy, and it was suggested by Liu et al.24 that it would enhance the privacy, since retrieving the exact original data from a reduced dimension version would not be possible (the possible solutions are infinite as the number of equations is less than the number of unknowns). Hence, Liu et al.24 proposed to use a random matrix to reduce the dimensions of the input data. Since a random matrix might decrease the utility, other approaches used both unsupervised and supervised DR techniques such as principal component analysis (PCA), discriminant component analysis (DCA), and multidimensional scaling (MDS). These approaches try to find the best [[Dimensional Reduction#Projection |Projection]] matrix for utility purposes, while relying on the reduced dimensionality aspect to enhance the privacy. Since an approximation of the original data can still be obtained from the reduced dimensions, some approaches, e.g. Jiang et al.25, combined dimensionality reduction with DP to achieve differentially-private data publishing. While some entities might seek total hiding of their data, DR has another benefit for privacy. For datasets that have samples with two labels: a utility label and a privacy label, Kung26 proposes a DR method to enable the data owner to project her data in a way that enables maximizing the accuracy of learning for the utility labels, while decreasing the accuracy for learning the privacy labels. [https://arxiv.org/ftp/arxiv/papers/1804/1804.11238.pdf Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University] |
= General Data Protection Regulations (GDPR) = | = General Data Protection Regulations (GDPR) = | ||
− | [ | + | [https://www.youtube.com/results?search_query=privacy+GDPR+ML+AI+deep+learning+artificial+intelligence YouTube search...] |
− | [ | + | [https://www.google.com/search?q=privacy+GDPR+ML+deep+learning+artificial+intelligence ...Google search] |
− | * [ | + | * [https://gdpr-info.eu/ General Data Protection Regulation (GDPR)] |
+ | {|<!-- T --> | ||
+ | | valign="top" | | ||
+ | {| class="wikitable" style="width: 550px;" | ||
+ | || | ||
<youtube>eH4y01-W8lA</youtube> | <youtube>eH4y01-W8lA</youtube> | ||
+ | <b>Using Machine Learning to Meet General Data Protection Regulations (Cloud Next '18) | ||
+ | </b><br>General Data Protection Regulation states that European citizens have the right to request for access to their personal data, and that photos of themselves are considered personal data. With a large database of photos, companies run the risk of running afoul of regulations which can cost many millions in fines. During this session, we will demonstrate how we use Machine Learning to solve this problem at different levels through facial recognition models and categorization of unstructured data. Subscribe to the Google Cloud channel! → https://bit.ly/NextSub | ||
+ | |} | ||
+ | |<!-- M --> | ||
+ | | valign="top" | | ||
+ | {| class="wikitable" style="width: 550px;" | ||
+ | || | ||
<youtube>RLEtyfmsfs4</youtube> | <youtube>RLEtyfmsfs4</youtube> | ||
+ | <b>What is the impact of GDPR on AI and Machine Learning? | ||
+ | </b><br>AI and Privacy Under European Data Protection Law, Friends or Foes? Michel Jaccard, Founder & Partner of id est avocats SwissAI Machine Learning Meetup 2018.09.24 1. What is the European Approach to Privacy? 2. What are the GDPR core provisions? 3. What are the rights of data subjects under GDPR? 4. What are the guidelines for data treatment under GDPR? Bio: Michel Jaccard is the founder and partner of id est avocats, an award-winning boutique law firm focusing on delivering strategic and expert advice to successful startups, innovative companies and global brands in the fields of technology, media, intellectual property, privacy, and cybersecurity. Michel was listed among the “300 most influential personalities” in Switzerland by Bilan Magazine. https://www.linkedin.com/in/jaccard Abstract: Machine learning and AI require a lot of data to improve and be usable. European Union’s new data protection and privacy regulation puts strict limits on the use of personal data and require increased transparency in any processing. Will regulation and technology collide or is there a bright future for European based AI companies complying with the GDPR? ## Organizers ## SwissAI Machine Learning Meetup is one of the larges AI meetups in Switzerland, with regular meetings and great speakers invited from academia and industry. For more information and future events visit https://www.SwissAI.org | ||
+ | |} | ||
+ | |}<!-- B --> | ||
+ | {|<!-- T --> | ||
+ | | valign="top" | | ||
+ | {| class="wikitable" style="width: 550px;" | ||
+ | || | ||
<youtube>8pJR72sLzyk</youtube> | <youtube>8pJR72sLzyk</youtube> | ||
+ | <b>AI, GDPR and the limits of automated processing | CogX 2019 | ||
+ | </b><br>Kathryn Corrick; Data Privacy Specialist Corrick, Wales and Partners LLP Simon McDougall; Executive Director, Technology Policy and Innovation Information Commissioner's Office Dr. Sandra Wachter; Turing Fellow, The Alan Turing Institute Roger Taylor; Chair, Centre for Data [[Ethics]] and Innovation Centre for Data [[Ethics]] & Innovation Lord Tim Clement Jones; Co-Chair, All-Party Parliamentary Group on Artificial Intelligence CogX is hosted by Charlie Muirhead Co-Founder and CEO, and Co-Founder Tabitha Goldstaub. Find out more at: https://cogx.co/ | ||
+ | |} | ||
+ | |<!-- M --> | ||
+ | | valign="top" | | ||
+ | {| class="wikitable" style="width: 550px;" | ||
+ | || | ||
<youtube>FSvkxQ4ofdc</youtube> | <youtube>FSvkxQ4ofdc</youtube> | ||
− | <youtube> | + | <b>Machine Learning Interpretability in the GDPR Era - Gregory C. Antell (BigML) |
+ | </b><br>Gregory Antell explores the definition of interpretability in ML, the trade-offs with complexity and performance, and surveys the major methods used to interpret and explain ML models in the GDPR era. Data scientist and product manager with broad experience in applied machine learning and natural language processing. Gregory's main efforts at BigML are divided between data science consulting with enterprise customers and advancing the usability, adoption, and performance of the BigML machine learning platform. Previously, he was the Data Science Lead of Insight Data Science in Boston, where Gregory mentored over 100 PhD-educated Fellows in developing compelling and unique data science and machine learning projects and also spearheaded and developed partnerships with top data science teams throughout Boston and New England. | ||
+ | |} | ||
+ | |}<!-- B --> | ||
+ | {|<!-- T --> | ||
+ | | valign="top" | | ||
+ | {| class="wikitable" style="width: 550px;" | ||
+ | || | ||
+ | <youtube>ZeRCMaVGtqg</youtube> | ||
+ | <b>CPDP 2019: Regulating artificial intelligence - is the GDPR enough? | ||
+ | </b><br>Chair: Ian Brown, Research ICT Africa (UK) Speakers: Paul Nemitz, DG JUST (EU); Mireille Hildebrandt, VUB-LSTS (BE); Ben Zevenbergen, Princeton University (US) The [[development]] of Artificial Intelligence/Machine Learning tools often depends on vast quan- tities of data - frequently personal data as defined by the GDPR. Given the extensive limits and controls applied by the GDPR (and the fundamental rights to privacy and data protection under- pinning it in EU law), will the developing interpretation of these laws by national courts and the Court of Justice fully protect EU residents’ rights - or will further ex ante regulation be required? In this Oxford Union-style debate, three leading experts will speak for 10 minutes each for and against the motion that “This House believes the GDPR will not be enough to regulate Artificial Intelligence”. They will then debate the motion with the audience, before a final vote is held. | ||
+ | |} | ||
+ | |<!-- M --> | ||
+ | | valign="top" | | ||
+ | {| class="wikitable" style="width: 550px;" | ||
+ | || | ||
+ | <youtube>DMzhTY891io</youtube> | ||
+ | <b>Great Models with Great Privacy: Optimizing ML and AI Under GDPR with Sim Simeonov &Slater Victoroff | ||
+ | </b><br>The General Data Protection Regulation (GDPR), which came into effect on May 25, 2018, establishes strict guidelines for managing personal and sensitive data, backed by stiff penalties. GDPR’s requirements have forced some companies to shut down services and others to flee the EU market altogether. GDPR’s goal to give consumers control over their data and, thus, increase consumer trust in the digital ecosystem is laudable. | ||
+ | However, there is a growing feeling that GDPR has dampened innovation in machine learning & AI applied to personal and/or sensitive data. After all, ML & AI are hungry for rich, detailed data and sanitizing data to improve privacy typically involves redacting or fuzzing inputs, which multiple studies have shown can seriously affect model quality and predictive power. While this is technically true for some privacy-safe modeling techniques, it’s not true in general. The root cause of the problem is two-fold. First, most data scientists have never learned how to produce great models with great privacy. Second, most companies lack the systems to make privacy-safe machine learning & AI easy. This talk will challenge the implicit assumption that more privacy means worse predictions. Using practical examples from production environments involving personal and sensitive data, the speakers will introduce a wide range of techniques–from simple hashing to advanced [[embedding]]s–for high-accuracy, privacy-safe model [[development]]. Key topics include pseudonymous ID generation, semantic scrubbing, structure-preserving data fuzzing, task-specific vs. task-independent sanitization and ensuring downstream privacy in multi-party collaborations. Special attention will be given to Spark-based production environments. About: [[Databricks]] provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Website: https://databricks.com | ||
+ | |} | ||
+ | |}<!-- B --> |
Latest revision as of 19:11, 16 August 2023
YouTube search... ...Google search
- Risk, Compliance and Regulation ... Ethics ... Privacy ... Law ... AI Governance ... AI Verification and Validation
- Cybersecurity ... OSINT ... Frameworks ... References ... Offense ... NIST ... DHS ... Screening ... Law Enforcement ... Government ... Defense ... Lifecycle Integration ... Products ... Evaluating
- Policy ... Policy vs Plan ... Constitutional AI ... Trust Region Policy Optimization (TRPO) ... Policy Gradient (PG) ... Proximal Policy Optimization (PPO)
- Blockchain
- Data Science ... Governance ... Preprocessing ... Exploration ... Interoperability ... Master Data Management (MDM) ... Bias and Variances ... Benchmarks ... Datasets
- OpenMined
- Facial recognition has to be regulated to protect the public, says AI report | Will Knight - MIT Technology Review
- Other Challenges in Artificial Intelligence
- Data Security and Privacy | IBM Research
The availability of massive amounts of data, coupled with high-performance cloud computing platforms, has driven significant progress in artificial intelligence and, in particular, machine learning and optimization. Indeed, much scientific and technological growth in recent years, including in computer vision, natural language processing, transportation, and health, has been driven by large-scale data sets which provide a strong basis to improve existing algorithms and develop new ones. However, due to their large-scale and longitudinal collection, archiving these data sets raise significant privacy concerns. They often reveal sensitive personal information that can be exploited, without the knowledge and/or consent of the involved individuals, for various purposes including monitoring, discrimination, and illegal activities. The AAAI Workshop on Privacy-Preserving Artificial Intelligence
|
|
|
|
|
|
|
|
Contents
[hide]Privacy Preserving - Machine Learning (PPML) Techniques
Youtube search... ...Google search
- Secure and Private AI by Facebook AI | UDACITY ... introduction of three cutting-edge technologies for privacy-preserving AI: Federated Learning, Differential Privacy, and Encrypted Computation. Learn how to extend PyTorch with the tools necessary to train AI models that preserve user privacy.
- Perfectly Privacy-Preserving AI | Patricia Thaine
Many privacy-enhancing techniques concentrated on allowing multiple input parties to collaboratively train ML models without releasing their private data in its original form. This was mainly performed by utilizing cryptographic approaches, or differentially-private data release (perturbation techniques). Differential privacy is especially effective in preventing membership inference attacks. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Multiparty Computation (MPC) enables computation on data from different providers/parties, such that the other participating parties gain no additional information about each others’ inputs, except what can be learned from the public output of the algorithm. In other words, when we have the parties Alice, Bob and Casper, all three have access to the output. However, it is not possible for, e.g., Alice to know the plain data Bob and Casper provided. Secure Multiparty Computation — Enabling Privacy-Preserving Machine Learning | Florian Apfelbeck - Medium
- DeepSecure DeepSecure: Scalable Provably-Secure Deep Learning | B. Rouhani, M. S. Riazi, and F. Koushanfar
- SecureML SecureML: A System for Scalable Privacy-Preserving Machine Learning | Payman Mohassel & Yupeng Zhang
- MiniONN Oblivious Neural Network Predictions via MiniONN transformations | J. Liu, M. Juuti, Y. Lu and N. Asokan
- ABY3 ABY3: A Mixed Protocol Framework for Machine Learning | Payman Mohassel & Peter Rindal
Cryptographic Approaches
Youtube search... ...Google search
When a certain ML application requires data from multiple input parties, cryptographic protocols could be utilized to perform ML training/testing on encrypted data. In many of these techniques, achieving better efficiency involved having data owners contribute their encrypted data to the computation servers, which would reduce the problem to a secure two/three party computation setting. In addition to increased efficiency, such approaches have the benefit of not requiring the input parties to remain online. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Homomorphic Encryption
Youtube search... ...Google search
Fully homomorphic encryption enables the computation on encrypted data, with operations such as addition and multiplication that can be used as basis for more complex arbitrary functions. Due to the high cost associated with frequently bootstrapping the cipher text (refreshing the cipher text because of the accumulated noise), additive homomorphic encryption schemes were mostly used in PPML approaches. Such schemes only enable addition operations on encrypted data, and multiplication by a plain text. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Garbled Circuits
Youtube search... ...Google search
Assuming a two-party setup with Alice and Bob wanting to obtain the result of a function computed on their private inputs, Alice can convert the function into a garbled circuit, and send this circuit along with her garbled input. Bob obtains the garbled version of his input from Alice without her learning anything about Bob’s private input (e.g., using oblivious transfer). Bob can now use his garbled input with the garbled circuit to obtain the result of the required function (and can optionally share it with Alice). Some PPML approaches combined additive homomorphic encryption with Garbled circuits. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Secret Sharing
Youtube search... ...Google search
A method for distributing a secret among multiple parties, with each one holding a “share” of the secret. Individual shares are of no use on their own; however, when the shares are combined, the secret can be reconstructed. With threshold secret sharing, not all the “shares” are required to reconstruct the secret; but only “t” of them (“t” refers to threshold). In one setting, multiple input parties can generate “shares” of their private data, and send these shares to a set of non-colluding computation servers. Each server could compute a “partial result” from the “shares” it received. Finally, a results’ party (or a proxy) can receive these partial results, and combine them to find the final result. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Secure Processors
Youtube search... ...Google search
While initially introduced to ensure the confidentiality and integrity of sensitive code from unauthorized access by rogue software at higher privilege levels, Intel SGXprocessor are being utilized in privacy-preserving computation. Ohrimenko et al.14 developed a data oblivious ML algorithms for neural networks, SVM, k-means clustering, decision trees and matrix factorization that are based on SGX-processors. The main idea involves having multiple data owners collaborate to perform one of the above mentioned ML tasks with the computation party running the ML task on an SGX-enabled data center. An adversary can control all the hardware and software in the data center except for the SGX-processors used for computation. In this system, each data owner independently establishes a secure channel with the enclave (containing the code and data), authenticates themselves, verifies the integrity of the ML code in the cloud, and securely uploads its private data to the enclave. After all the data is uploaded, the ML task is run by the secure processor, and the output is sent to the results’ parties over secure authenticated channels. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Perturbation Approaches
Youtube search... ...Google search
Differential privacy (DP) techniques resist membership inference attacks by adding random noise to the input data, to iterations in a certain algorithm, or to the algorithm output. While most DP approaches assume a trusted aggregator of the data, local differential privacy allows each input party to add the noise locally; thus, requiring no trusted server. Finally, dimensionally reduction perturbs the data by projecting it to a lower dimensional hyperplane to prevent reconstructing the original data, and/or to restrict inference of sensitive information. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Differential Privacy (DP)
Youtube search... ...Google search
Differential privacy is a powerful tool for quantifying and solving practical problems related to privacy. Its flexible definition gives it the potential to be applied in a wide range of applications, including Machine Learning applications. Understanding Differential Privacy - From Intuitions behind a Theory to a Private AI Application | An Nguyen - Towards Data Science
Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts. Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks. Differential privacy was developed by cryptographers and thus is often associated with cryptography, and draws much of its language from cryptography. Wikipedia
Local Differential Privacy
Youtube search... ...Google search
When the input parties do not have enough information to train a ML model, it might be better to utilize approaches that rely on local differential privacy (LDP). With LDP, each input party would perturb their data, and only release this obscure view of the data. An old, and well-known version of local privacy is randomized response (Warner 1965), which provided plausible deniability for respondents to sensitive queries. For example, a respondent would flip a fair coin: (a) if “tails”, the respondent answers truthfully, and (b) if “heads”, then flip a second coin, and respond “Yes” if heads, and “No” if tails. RAPPOR 22 is a technology for crowdsourcing statistics from end-user client software by applying RR to Bloom filters with strong 𝜀-DP guarantees. RAPPOR is deployed in Google Chrome web browser, and it permits collecting statistics on client-side values and strings, such as their categories, frequencies, and histograms. By performing RR twice with a memoization step in between, privacy protection is maintained even when multiple responses are collected from the same participant over time. A ML oriented work, AnonML23, utilized the ideas of RR for generating histograms from multiple input parties. AnonML utilizes these histograms to generate synthetic data on which a ML model can be trained. Like other local DP approaches, AnonML is a good option when no input party has enough data to build a ML model on their own (and there is no trusted aggregator). Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
Dimensionality Reduction (DR)
Youtube search... ...Google search
perturbs the data by projecting it to a lower dimensional hyperplane. Such transformation is lossy, and it was suggested by Liu et al.24 that it would enhance the privacy, since retrieving the exact original data from a reduced dimension version would not be possible (the possible solutions are infinite as the number of equations is less than the number of unknowns). Hence, Liu et al.24 proposed to use a random matrix to reduce the dimensions of the input data. Since a random matrix might decrease the utility, other approaches used both unsupervised and supervised DR techniques such as principal component analysis (PCA), discriminant component analysis (DCA), and multidimensional scaling (MDS). These approaches try to find the best Projection matrix for utility purposes, while relying on the reduced dimensionality aspect to enhance the privacy. Since an approximation of the original data can still be obtained from the reduced dimensions, some approaches, e.g. Jiang et al.25, combined dimensionality reduction with DP to achieve differentially-private data publishing. While some entities might seek total hiding of their data, DR has another benefit for privacy. For datasets that have samples with two labels: a utility label and a privacy label, Kung26 proposes a DR method to enable the data owner to project her data in a way that enables maximizing the accuracy of learning for the utility labels, while decreasing the accuracy for learning the privacy labels. Privacy Preserving Machine Learning: Threats and Solutions | Mohammad Al-Rubaie - Iowa State University
General Data Protection Regulations (GDPR)
YouTube search... ...Google search
|
|
|
|
|
|