The best way to defeat the virus: Unite!
Crises are turning points in history. Most of us are facing a worldwide crisis of this magnitude for the first time. It is the test of our lifetime. Something that unites us all.
Right now, it is important to keep workplaces viable, hospitals efficient, and government agencies functioning. But that’s not everything. It is just as important to take care of the elderly lady who no longer gets visitors; the stressed-out family with kids in a three room apartment; the nurse who was overworked even before the crisis. That’s the spirit of „we’re are all in this together“. And only with this spirit can we prevail as a society.
How can we succeed? The answer is easy: By bringing all needs, all capabilities, and all good will together to organize ourselves and our efforts in the best possible way. A joint effort of many to create the biggest and most efficient network of help Europe has ever seen.
A joint effort of 500 million people.
How we are different
Times of crises have one advantage. They make us think differently and straightforwardly. The pandemic affects all of us. So the first job is to find out who needs support and who can provide support.
Establishing this can be done in a totally novel way: We can just ask. If 500 Million Europeans make it clear what they need and what they can contribute, challenges can be addressed much better than by just assuming.
This approach also makes sure that interest groups are cut out. Medicine, technology, public service and everyone else need to work together for a common goal. And the goal is to overcome the crisis with the power of everybody. With a society that proves its unity, strength and compassion.
We propose implementing a solution using the following architectural outline:
The architecture consists of three constituent parts: A public alerts information system to reliably inform citizens during a crisis; A „private data vault“ used to securely store user data; and an aggregation engine to process data multiple users without revealing any of their private information. The vision is far-reaching and wide-ranging, but not obvious at first glance, nor intuitive for everybody, so please take some time to engage with it.
Private Data Vault
Private Data Vault
An information system needs to leverage private user data to bring relevant information to users. However, users should be enabled to always own their private data and not be forced to share it with third parties just to receive relevant information. The Private Data Vault (or Vault) is the component in the architecture that enables users to own their data. Users install Vaults on their personal device. Vaults are responsible for storing and managing user data in a secure way on user devices. Personally identifiable user data never leaves the Vault. Whenever any kind of summarized data or statistics leaves the Vault, it only happens with the user’s consent.
Vaults provide an extensible infrastructure for working with data. This is achieved by designing the Vault as a system you can build on using plugins. We refer to plugins added to the Vault as Data Abilities. The Vault provides a runtime for executing Data Abilities. The Vault provides the infrastructure for storing and accessing data securely. Data Abilities are responsible for adding and reacting based on concrete data inside the Vault.
For example, a Data Ability is responsible for storing the list of locations visited by the user. Another Data Ability collects health data from the user through daily health questionnaires. Filtering relevant incoming alerts based on user data is another Data Ability. In other words, it is the responsibility of the Vault to filter messages that come from the outside based on data from within the Vault.
In summary, Vaults decentralise user data, so there is no single point of attack to expose data of multiple users. Vaults process information on the device where the data is stored, instead of taking private data from a user’s Vault and uploading it to a server to perform the computation. These Vaults provide a secure runtime on top of which multiple Data Abilities run and provide concrete ways to manage various kinds of data and implement multiple types of functionality based on the user’s data.
Security of the Private Data Vault is paramount. There are two main challenges regarding the Private Data Vault:
- Ensure that user data is stored securely inside the Vault and that no other apps running on the device can access it.
- Ensure that no private data is sent by the Vault to external services. Only anonymized data can leave the Vault, and even that data can only leave with the user’s consent.
Public Alert & Information System
The Private Data Vault consumes and filters messages that come from somewhere. That somewhere is the Public Alert & Information System (PAIS).
The PAIS is responsible for broadcasting information that can be filtered and contextualized by the Vaults. In this way the disseminated information becomes highly targeted for users in a privacy-compliant way.
There are many kinds of information that are relevant for crisis informatics. An alert is one of them. Alerts are most often information authorized by governments or similar institutions. Currently, these alerts are handled through general communication channels, like mainstream media. Instead, the PAIS broadcasts messages from many places to user devices and lets those devices determine which information is relevant. This filtering of information is based on the personal and private information in each user’s Vault; these information are filtered locally by the Vault. Hence, the determination is made without the user’s private data leaving their device.
For example, an alert can come from a government, but it can also come from a local authority. They are both alerts but they are not relevant for everyone. The countrywide alert is relevant for everyone in the country, but it might also be relevant for someone that wants to travel to that country. At the same time, an alert from a local authority is relevant only for the people from that region. Disseminating information through general communication channels, like mainstream media, is like using a megaphone. It is useful to attract everyone’s attraction, but it can only be used effectively for a few things because otherwise it is merely noise. However, a crisis, especially like this one, requires localized information that can be actionable locally. The PAIS is a megaphone used all the time, but only the relevant ears are hearing it through the filtering that happens on the Vaults.
The PAIS can also disseminate other kinds of information from other stakeholders. For example, a local shop can push an alert that there is a line to get in, informing so that other shoppers can safely wait at home for the foot traffic to slow. This is, of course, relevant information that can technically be transmitted following the same mechanism as the alert. However, at the same time, it is not as sensitive and has to happen through a different, less invasive, channel following other authorization mechanisms than the alerts.
The PAIS must have a distributed architecture, with one or more nodes potentially in every EU country. Governments and institutions connect to a selected node and publish new alerts there. The PAIS then replicates the published news though all the other nodes. User devices can connect to any of the nodes to download new information.
The distributed architecture comes with several advantages. It can be easily adopted to different national structures. It still works, even if part of the internet infrastructure is down. It can also be reused for different purposes, e.g. for a large corporate informing employees about the current corona situation. It is always possible to keep governmental and other channels separated and adaptable for situations.
The first thing to note about the PAIS server is that the data on the server is already public. This implies that there is no privacy concern for that public data. However, there still are security concerns. Namely:
- Distribution of false data due to human error. The principle of segregation of duties should be applied on the editorial side to avoid publishing false material by mistake. The data distributed by the server must always be accurate and come from approved sources. The client should be able to authenticate the server or even individual messages through cryptographic means.
- Distribution of false or malicious data due to a compromise of the server, or attacks on the overall infrastructure. All common security controls must be implemented to ensure security of the server and related infrastructure (security hardening, penetration testing, etc.).
- Denial of service due to a compromise of the server and related infrastructure or due to other attacks (DDOS attack, DNS hijacking, etc.). The system must be highly available, scalable and be resilient to denial of service attacks.
- Redaction of alerts. The system needs to assume that an alert could have been published by mistake or with inaccurate information. Since ensuring up-to-date and correct information is paramount during a crisis, the PAIS should reliably handle in a timely manner redactions or removal of alerts, including alerts already downloaded to user devices.
To maintain users privacy the PAIS relies on the fact that all Vaults will receive all alerts relevant for a certain area – the European Union in our case. For this to work, the volume of the data should be reasonably low to ensure it is practical for users to download it on a daily basis.
Making decisions on a macro scale aggregations of data from many users can be very useful. Each individual piece of data however resides inside a user’s Vault. To create these aggregations users should not be forced to give up their data to a third party. Instead the aggregation engines perform these aggregations without revealing the individual data of any user. Only the final result of the aggregation is communicated and potentially stored.
For example, let’s consider that a city has asked citizens to stay at home to reduce the infection spread. To get an overview if the measures are effective, the city officials would benefit from an aggregation of people’s movements. However, this aggregation should be obtained without exposing the privacy of individuals. To obtain this, the city officials could send a computation to the Vaults that relies on the location history of users, but only send back the number of kilometres traveled by the user per day in the last week. The city officials receive an aggregation telling them on average how many kilometres people traveled every day in the last week. The results could be even more detailed: For example, it could also show the number of people that travelled a given distance. This information provides valuable feedback for public officials without revealing any personal data of any users. The data remains safely in the user’s Vault.
This is just an example. Other examples could include: How frequently do users go shopping? How often do people use public transport? Do users frequent public places less often?
The Aggregation Engine focuses on providing differential privacy: It enables sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. To support the Aggregation Engine, Vaults implement a Data Ability that takes a given computation of interest for a research group, public institution or government. This computation runs inside a user’s Vault and returns to the interested parties only aggregated results. The data of the users never leaves their Vaults. Only aggregated statistics about that data is exposed.
Users are also at all times in full control of what is happening in their Vaults. Users can:
- See what computation a third party is interested in running in their Vaults,
- Get informed about why this third party is interested in running that computation on their data,
- Know what pieces of data that computation will use from their Vaults, and
- See the result that the computation is going to return from their Vaults to the outside world.
At any point a user can choose to stop the computation.
Extracting data from Vaults in this way is crucial in a crisis like the current one as it allows public institutions to make better decisions.
The greatest challenge of the Aggregation Engine is to block third parties from getting individual user data. That can be achieved by thoroughly reviewing all computations that will be sent to the Vaults and also allowing users to review those computations and see what results will be exposed from their pod.
A path to achieve this work is through federated learning solutions: the overall aggregation logic is split into subparts that are distributed to each device, each device executes the logic locally and sends back a summary that does not reveal the private data. This summary is then aggregated centrally.
However, there should be a component of the system that takes individual results from Vaults and aggregates the results. That component now becomes the central point of failure as it can potentially know which results came from which Vaults. To address this, the component does not store any data beyond what is necessary for the duration of the computation. It is crucial that aggregation is done on this system component in a way that does not expose individual results. A further way to achieve this is through Secure Multiparty Computation approaches.