Are you are researcher? Start here!
Are you a data-provider? Start here!
Do you want to host a Vantage server? Start here!
The growing complexity of cancer diagnosis and treatment requires data sets that are larger than currently available in a single hospital or even in cancer registries. However, sharing patient data is difficult due to patient privacy and data protection needs. Privacy preserving distributed learning technology has the potential to overcome these limitations.
The general idea behind distributed learning is that sites share a (statistical) model and model parameters instead of sharing sensitive data: each site runs computations on a local data store that generate these aggregated statistics. In this setting organizations can collaborate by exchanging aggregated data/statistics while keeping the underlying data safely on site and undisclosed.
Collaboration through distributed learning requires an infrastructure. The open source software Vantage provides this infrastructure. Conceptually, it consists of the following parts:
A central server that coordinates communication with the nodes;
One or more nodes that execute algorithms (encapsulated in Docker images) and return the output;
Organisations that are interested in collaborating with each other;
Collaborations between organisations;
Users (i.e. researchers) that instruct the nodes which algorithms to execute and the parameters to use;
A Docker registry that functions as a database of algorithms.