Measuring the leakage of a black-box using Machine Learning

Giovanni Cherubin

We consider the problem of measuring the information leakage of a generic system, seen as a black-box. The black-box takes secret inputs, and returns outputs according to unknown underlying distributions; we are interested in determining how much an adversary can predict of the secret input given the respective output. This formulation captures a wide class of attacks, ranging from side channels and traffic analysis to membership inferences.

The problem of measuring a black-box's leakage has historically been based on the principles of classical statistics; unfortunately, the methods derived from these principles do not scale to most real-world systems (e.g., black-boxes whose output space is very large or even continuous), and thus had a limited application.

In this talk, we re-frame this problem within a Machine Learning framework. This: i) gives us access to a large set of practical tools for measuring a black-box's leakage, which scale to very large systems, and ii) it allows us to better scope what can be done in the future (e.g., establishing impossibility results). The tools presented here were introduced in this context last year by the author, and they were used for deriving the first generic security bounds for a major class of traffic analysis attacks (Website Fingerprinting). We will give a formal introduction to these tools, discuss their applications and theoretical limitations, and highlight open questions.

Further info: https://giocher.com