A simple demonstration of a privacy-preserving de-centralised genotype imputation workflow
Guardado en:
| Publicado en: | bioRxiv (Jan 15, 2025) |
|---|---|
| Autor principal: | |
| Otros Autores: | , , , |
| Publicado: |
Cold Spring Harbor Laboratory Press
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF Full text outside of ProQuest |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | Recently, a number of studies have looked at the problem of privacy and data-sharing restrictions in the context of missing genotype imputation servers. This relates to the most typical imputation pipelines which involve a whole-genome sequenced haplotype reference panel being compared to genotyped study individuals (who have missing data to be imputed). Hence, involving two datasets from separate sources coming together in one informatic environment, where relatively complicated statistical models are applied; specifically, hidden Markov modelling. We give a short review of the current literature in this domain, observing three prevalent strategies: complicated data encryption, technical solutions to secure computation environments, and rearrangements of haplotype data in an effort to provide anonymisation. We embarked on a thought experiment to provide a potential fourth type of solution involving federating the different internal tasks within the statistical methods used for imputation. This idea is relevant considering there is currently motivation for federated analyses platforms in Europe for making combined inference across multiple genomic data resources. This allows for very simple manipulations to protect sensitive individual level data, which enable imputation algorithms to complete on simple plain-text files. We provide here an illustration of how such a federated imputation server could be put in place, along with associated code, including a simple implementation of the Li-Stephens haplotype mosaic model to achieve the imputation of missing genotypes. We name our general framework ANONYMP for anonymised imputation. A demonstration of the concept is given involving simulated data generated with msprime. We show that dividing different parts of the required calculations for statistical imputation between several sites is a valuable new avenue in the field of privacy-preserving imputation server development.Competing Interest StatementThe authors have declared no competing interest. |
|---|---|
| ISSN: | 2692-8205 |
| DOI: | 10.1101/2025.01.13.632689 |
| Fuente: | Biological Science Database |