Security, Privacy and Information Theory Workshop (Protect-IT) (IEEE CSF 2024 Workshop)

Schedule

Morning Session
8:30 - 9:30	Registration
9:30-10:30	Invited talk: Josep Domingo-Ferrer — The accuracy, security, and privacy conflict in machine learning
10:30 - 11:00	Coffee break
11:00 - 12:30	Session 1: Federated learning and secure computation
11:00–11:30	Bayes’ capacity as a measure for reconstruction attacks in federated learning. Natasha Fernandes (Macquarie University), Sayan Biswas (EPFL), Annabelle McIver (Macquarie University), Parastoo Sadeghi (UNSW Canberra), Pedro Faustini (Macquarie University), Mark Dras (Macquarie University) and Catuscia Palamidessi (Inria and Ecole Polytechnique)
Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy-preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the weight updates performed during stochastic gradient descent. In response to these threats the privacy community recommends the use of differential privacy in the stochastic gradient descent algorithm, termed DP-SGD. However, DP has not yet been formally established as an effective counter-measure against reconstruction attacks. In this paper we formalise the reconstruction threat model using the information-theoretic framework of quantitative information flow. We show that the Bayes' capacity, related to the Sibson mutual information of order infinity, represents a tight upper bound on the leakage of the DP-SGD algorithm to an adversary interested in performing a reconstruction attack. We provide empirical results demonstrating the effectiveness of this measure for comparing mechanisms against reconstruction threats.
11:30–12:00	Verifiable cross-silo federated learning. Aleksei Korneev (INRIA Lille, University of Lille) and Jan Ramon (INRIA Lille)
Federated Learning (FL) is a widespread approach that allows training machine learning (ML) models with data distributed across multiple devices. In cross-silo FL, which often appears in domains like healthcare or finance, the number of participants is moderate, and each party typically represents a well-known organization. However, malicious agents may still attempt to disturb the training procedure in order to obtain certain benefits, for example, a biased result or a reduction in computational load. While one can easily detect a malicious agent when data used for training is public, the problem becomes much more acute when it is necessary to maintain the privacy of the training dataset. To address this issue, there is recently growing interest in developing verifiable protocols, where one can check that parties do not deviate from the training procedure and perform computations correctly. In this paper, we conduct a comprehensive analysis of such protocols, and fit them in a taxonomy. We perform a comparison of the efficiency and threat models of various approaches. We next identify research gaps and discuss potential directions for future scientific work.
12:00–12:30	Overview on Secure Comparison. Quentin Sinh (INRIA Lille) and Jan Ramon (INRIA Lille)
Introduced by Yao’s Millionaires’ problem, Secure Comparison (SC) allows parties to compare two secrets in a privacy-preserving manner. This article gives an overview of the different SC techniques in various settings such as Secret Sharing (SS) or Homomorphic Encryption (HE).
12:30–14:00	Lunch break
Afternoon Session
14:00–15:00	Invited talk: Jan Ramon — Applying differential privacy theory in practical applications
15:00–15:30	Coffee break
15:30 - 17:00	Session 2: Differential privacy and security attacks
15:30–16:00	Node injection link stealing attack. Oualid Zari (EURECOM), Javier Parra-Arnau (Universitat Politècnica de Catalunya), Ayşe Ünsal (EURECOM) and Melek Önen (EURECOM)
In this paper, we present a stealthy and effective attack that exposes privacy vulnerabilities in Graph Neural Networks (GNNs) by inferring private links within graph-structured data. Focusing on dynamic GNNs, we propose to inject new nodes and attach them to a particular target node to infer its private edge information. Our approach significantly enhances the F1 score of the attack beyond the current state-of-the-art benchmarks. Specifically, for the Twitch dataset, our method improves the F1 score by 23.75%, and for the Flickr dataset, it records a remarkable improvement, where the new performance is more than three times better than the state-of-the-art. We also propose and evaluate defense strategies based on differentially private (DP) mechanisms relying on a newly defined DP notion, which, on average, reduce the effectiveness of the attack by approximately 71.9% while only incurring a minimal average utility loss of about 3.2%.
16:00–16:30	Secure latent dirichlet allocation. Thijs Veugen (Netherlands Organisation for Applied Scientific Research), Vincent Dunning (Netherlands Organisation for Applied Scientific Research), Michiel Marcus (Netherlands Organisation for Applied Scientific Research) and Bart Kamphorst (Netherlands Organisation for Applied Scientific Research)
Topic modeling refers to a popular set of techniques used to discover hidden topics that occur in a collection of documents. These topics can, for example, be used to categorize documents or label text for further processing. One popular topic modeling technique is Latent Dirichlet Allocation (LDA). In topic modeling scenarios, the documents are often assumed to be in one, centralized dataset. However, sometimes documents are held by different parties, and contain privacy- or commercially-sensitive information that cannot be shared. We present a novel, decentralized approach to train an LDA model securely without having to share any information about the content of the documents with the other parties. We preserve the privacy of the individual parties using secure multi-party computation (MPC), achieving similar accuracy compared to an (insecure) centralised approach. With $1024$-bit Paillier keys, a topic model with $5$ topics and $3000$ words can be trained in around $16$ hours. Furthermore, we show that the solution scales linearly in the total number of words and the number of topics.
16:30–17:00	Probabilistic parallel composition theorems for differential privacy. Àlex Miranda-Pascual (Karlsruhe Institute of Technology, Universitat Politècnica de Catalunya), Javier Parra-Arnau (Universitat Politècnica de Catalunya ) and Thorsten Strufe (Karlsruhe Institute of Technology)
In this short abstract, we present new composition results for (epsilon,delta)-DP that go even further, namely, probabilistic parallel composition. In this new composition scenario, the mechanisms take as input disjoint subsets of the initial database, as in parallel composition, but where the input subsets are chosen randomly instead of deterministically. We provide two theorems with different ways to randomly select the inputs: The first, defined for unbounded DP, samples each record into a single input according to a fixed distribution; while the second, defined for bounded DP, samples subsets of fixed size uniformly. Notably, these new composition methods improve privacy by introducing uncertainty, and allow us to obtain lower privacy parameters than those obtained by the classical parallel composition results. We believe these new techniques can be useful for the future design of DP mechanisms.

Security, Privacy and Information Theory

Scope

Call For Papers & Important Dates

Submission Instructions

Invited Speakers

Schedule

Organization

Workshop chairs

Publications chair

Organizers

Program Committee