Security, Privacy and Information Theory

IEEE CSF 2024 Workshop
July 8, 2024
Enschede, The Netherlands

CSF Registration

Scope

Protect-IT targets to attract studies on security and privacy for machine learning (ML) from an information-theoretic standpoint. Accuracy and efficiency of ML systems are ensured by employing large datasets which usually contain highly sensitive/personal information. This strong dependence on personal information jeopardizes the privacy and security of innocent Internet users who are contributing, knowingly or not, to these online statistical datasets. Protect-IT aims to bring the typical attendees of CSF, who have expertise on the theory of cryptography and (algorithmic) fairness together with researchers on information theory to study, develop, and evaluate privacy, security, and fairness attacks against ML along with defense strategies to counter them. We put information theory at the heart of this endeavor and call for contributions grounded in information-theoretic concepts and principles, aiming to enrich preliminary research efforts and to achieve widespread adoption.

Call For Papers & Important Dates

Download Full CFP


Submission deadline: May 04, 2024, 23:59 (Anywhere on Earth)
May 13, 2024, 23:59 (Anywhere on Earth)
Notification of acceptance: June 04, 2024 June 07, 2024

Submission Instructions

We welcome two types of submissions: extended abstracts and posters. Extended abstracts must be at most 4 pages long excluding references and adhere to the CSF format. We encourage submissions of work that is new to the community of data privacy, security and information theory in addition to submissions which are currently under review elsewhere or recently published in privacy and security venues. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to publish their work on the workshop's webpage or to provide a link to arXiv.

Invited Speakers

Schedule

Morning Session
8:30 - 9:30 Registration
9:30-10:30 Invited talk: Josep Domingo-Ferrer — The accuracy, security, and privacy conflict in machine learning
10:30 - 11:00 Coffee break
11:00 - 12:30 Session 1: Federated learning and secure computation
11:00–11:30 Bayes’ capacity as a measure for reconstruction attacks in federated learning.
Natasha Fernandes (Macquarie University), Sayan Biswas (EPFL), Annabelle McIver (Macquarie University), Parastoo Sadeghi (UNSW Canberra), Pedro Faustini (Macquarie University), Mark Dras (Macquarie University) and Catuscia Palamidessi (Inria and Ecole Polytechnique)
Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy-preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the weight updates performed during stochastic gradient descent. In response to these threats the privacy community recommends the use of differential privacy in the stochastic gradient descent algorithm, termed DP-SGD. However, DP has not yet been formally established as an effective counter-measure against reconstruction attacks. In this paper we formalise the reconstruction threat model using the information-theoretic framework of quantitative information flow. We show that the Bayes' capacity, related to the Sibson mutual information of order infinity, represents a tight upper bound on the leakage of the DP-SGD algorithm to an adversary interested in performing a reconstruction attack. We provide empirical results demonstrating the effectiveness of this measure for comparing mechanisms against reconstruction threats.
11:30–12:00 Verifiable cross-silo federated learning.
Aleksei Korneev (INRIA Lille, University of Lille) and Jan Ramon (INRIA Lille)
Federated Learning (FL) is a widespread approach that allows training machine learning (ML) models with data distributed across multiple devices. In cross-silo FL, which often appears in domains like healthcare or finance, the number of participants is moderate, and each party typically represents a well-known organization. However, malicious agents may still attempt to disturb the training procedure in order to obtain certain benefits, for example, a biased result or a reduction in computational load. While one can easily detect a malicious agent when data used for training is public, the problem becomes much more acute when it is necessary to maintain the privacy of the training dataset. To address this issue, there is recently growing interest in developing verifiable protocols, where one can check that parties do not deviate from the training procedure and perform computations correctly. In this paper, we conduct a comprehensive analysis of such protocols, and fit them in a taxonomy. We perform a comparison of the efficiency and threat models of various approaches. We next identify research gaps and discuss potential directions for future scientific work.
12:00–12:30 Overview on Secure Comparison.
Quentin Sinh (INRIA Lille) and Jan Ramon (INRIA Lille)
Introduced by Yao’s Millionaires’ problem, Secure Comparison (SC) allows parties to compare two secrets in a privacy-preserving manner. This article gives an overview of the different SC techniques in various settings such as Secret Sharing (SS) or Homomorphic Encryption (HE).
12:30–14:00 Lunch break
Afternoon Session
14:00–15:00 Invited talk: Jan Ramon — Applying differential privacy theory in practical applications
15:00–15:30 Coffee break
15:30 - 17:00 Session 2: Differential privacy and security attacks
15:30–16:00 Node injection link stealing attack.
Oualid Zari (EURECOM), Javier Parra-Arnau (Universitat Politècnica de Catalunya), Ayşe Ünsal (EURECOM) and Melek Önen (EURECOM)
In this paper, we present a stealthy and effective attack that exposes privacy vulnerabilities in Graph Neural Networks (GNNs) by inferring private links within graph-structured data. Focusing on dynamic GNNs, we propose to inject new nodes and attach them to a particular target node to infer its private edge information. Our approach significantly enhances the F1 score of the attack beyond the current state-of-the-art benchmarks. Specifically, for the Twitch dataset, our method improves the F1 score by 23.75%, and for the Flickr dataset, it records a remarkable improvement, where the new performance is more than three times better than the state-of-the-art. We also propose and evaluate defense strategies based on differentially private (DP) mechanisms relying on a newly defined DP notion, which, on average, reduce the effectiveness of the attack by approximately 71.9% while only incurring a minimal average utility loss of about 3.2%.
16:00–16:30 Secure latent dirichlet allocation.
Thijs Veugen (Netherlands Organisation for Applied Scientific Research), Vincent Dunning (Netherlands Organisation for Applied Scientific Research), Michiel Marcus (Netherlands Organisation for Applied Scientific Research) and Bart Kamphorst (Netherlands Organisation for Applied Scientific Research)
Topic modeling refers to a popular set of techniques used to discover hidden topics that occur in a collection of documents. These topics can, for example, be used to categorize documents or label text for further processing. One popular topic modeling technique is Latent Dirichlet Allocation (LDA). In topic modeling scenarios, the documents are often assumed to be in one, centralized dataset. However, sometimes documents are held by different parties, and contain privacy- or commercially-sensitive information that cannot be shared. We present a novel, decentralized approach to train an LDA model securely without having to share any information about the content of the documents with the other parties. We preserve the privacy of the individual parties using secure multi-party computation (MPC), achieving similar accuracy compared to an (insecure) centralised approach. With $1024$-bit Paillier keys, a topic model with $5$ topics and $3000$ words can be trained in around $16$ hours. Furthermore, we show that the solution scales linearly in the total number of words and the number of topics.
16:30–17:00 Probabilistic parallel composition theorems for differential privacy.
Àlex Miranda-Pascual (Karlsruhe Institute of Technology, Universitat Politècnica de Catalunya), Javier Parra-Arnau (Universitat Politècnica de Catalunya ) and Thorsten Strufe (Karlsruhe Institute of Technology)
In this short abstract, we present new composition results for (epsilon,delta)-DP that go even further, namely, probabilistic parallel composition. In this new composition scenario, the mechanisms take as input disjoint subsets of the initial database, as in parallel composition, but where the input subsets are chosen randomly instead of deterministically. We provide two theorems with different ways to randomly select the inputs: The first, defined for unbounded DP, samples each record into a single input according to a fixed distribution; while the second, defined for bounded DP, samples subsets of fixed size uniformly. Notably, these new composition methods improve privacy by introducing uncertainty, and allow us to obtain lower privacy parameters than those obtained by the classical parallel composition results. We believe these new techniques can be useful for the future design of DP mechanisms.

Organization


Workshop chairs

  • Ayşe Ünsal, EURECOM
  • Javier Parra Arnau, Universitat Politècnica de Catalunya

Publications chair

  • Melek Önen, EURECOM

Organizers

  • Víctor Rubio Jornet, Universitat Politècnica de Catalunya, UPC

Program Committee

  • Aurélien Bellet, INRIA
  • Josep Domingo-Ferrer, University of Rovira i Virgili
  • Sébastien Gambs, Université du Québec à Montréal
  • Cédric Gouy-Pailler, CEA
  • Emre Gürsoy, Koç University
  • Arun Padakandla, EURECOM
  • Jan Ramon, INRIA
  • Vicenç Torra, Umeå University
  • Weizhi Meng, Technical University of Denmark