The major histocompatibility complex (MHC) contains a group of genes (~260 genes in ~4Mb) involved in several inflammatory disorders and immune response including the HLA-C gene. So far, the IPD-IMGT/HLA database reports more than 4000 different HLA-C alleles. Given the highly polymorphic nature of the gene, GWAS generally don’t study or study only a small subset of polymorphic sites of the region. Imputation procedures may help in gaining additional information on this region. However, the successful imputation of the MHC region would require a reference panel with detailed information. The main goal of this study is to investigate whether imputation procedures using appropriate reference panels may effectively increase the number of polymorphic sites of the MHC region for association with complex traits. We studied the MHC region imputation performances using 3 different reference panels (Michigan and TOPMed imputation servers): TOPMed-r2, 1000 Genomes (Phase3, v5), and the novel four-digit multi-ethnic HLA panel (v1, 2021). Here, 5 datasets with more than 1000 individuals each underwent imputation. We then focused on the imputation results of the MHC region that surround the HLA-C gene (hg19: 31234948-31241032). Imputation reported a different number of markers for the different reference panels: 482 in 1000G, 365 in TOPMed, and 1272 in HLA-panel. Of note, the HLA panels gave a higher number of imputed markers than the others. We then selected the 104 common markers imputed by all the 3 reference panels. Moreover, 162 markers were found only by 1000G panel, 194 by TOPMed, and 998 by the HLA-panel. The first preliminary comparisons showed a high concordance value for the genotype calling by the 3 different reference sets. The efficiency of the imputation was measured by the R-squared (R2) values stratifying the markers into 3 groups according to the minor allele frequency (MAF). The 104 common markers showed high R2 values (>0.96). As expected, in the other marker groups, the R2 mean values were lower for markers with MAF<0.1 (>0.65 in 1000G, 0.15-0.20 in TOPMed, >0.40 in HLA panel). In conclusion, imputation-based procedures with dedicated HLA panels can produce much more high-quality information than other general purpose reference panels for the MHC region.
Testing the performance of the imputation of MHC region in large datasets when using different reference panels
Elena Locatelli
;Mirko Treccani;Cristina Patuzzo;Laura Veschetti;Elisa de Tomi;Dramane Dagnogo;Martina Gallinaro;Chiara Stefani;Donato Zipeto;Stefano Tamburin;Giovanni Malerba
2022-01-01
Abstract
The major histocompatibility complex (MHC) contains a group of genes (~260 genes in ~4Mb) involved in several inflammatory disorders and immune response including the HLA-C gene. So far, the IPD-IMGT/HLA database reports more than 4000 different HLA-C alleles. Given the highly polymorphic nature of the gene, GWAS generally don’t study or study only a small subset of polymorphic sites of the region. Imputation procedures may help in gaining additional information on this region. However, the successful imputation of the MHC region would require a reference panel with detailed information. The main goal of this study is to investigate whether imputation procedures using appropriate reference panels may effectively increase the number of polymorphic sites of the MHC region for association with complex traits. We studied the MHC region imputation performances using 3 different reference panels (Michigan and TOPMed imputation servers): TOPMed-r2, 1000 Genomes (Phase3, v5), and the novel four-digit multi-ethnic HLA panel (v1, 2021). Here, 5 datasets with more than 1000 individuals each underwent imputation. We then focused on the imputation results of the MHC region that surround the HLA-C gene (hg19: 31234948-31241032). Imputation reported a different number of markers for the different reference panels: 482 in 1000G, 365 in TOPMed, and 1272 in HLA-panel. Of note, the HLA panels gave a higher number of imputed markers than the others. We then selected the 104 common markers imputed by all the 3 reference panels. Moreover, 162 markers were found only by 1000G panel, 194 by TOPMed, and 998 by the HLA-panel. The first preliminary comparisons showed a high concordance value for the genotype calling by the 3 different reference sets. The efficiency of the imputation was measured by the R-squared (R2) values stratifying the markers into 3 groups according to the minor allele frequency (MAF). The 104 common markers showed high R2 values (>0.96). As expected, in the other marker groups, the R2 mean values were lower for markers with MAF<0.1 (>0.65 in 1000G, 0.15-0.20 in TOPMed, >0.40 in HLA panel). In conclusion, imputation-based procedures with dedicated HLA panels can produce much more high-quality information than other general purpose reference panels for the MHC region.File | Dimensione | Formato | |
---|---|---|---|
Locatelli_E_SIGU2022.pdf
accesso aperto
Tipologia:
Altro materiale allegato
Licenza:
Dominio pubblico
Dimensione
3.6 MB
Formato
Adobe PDF
|
3.6 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.