A Brazilian dataset is adding diversity to the international pool. It includes whole-genome sequences from 1,171 highly admixed Brazilians. The data was compiled by researchers affiliated with the Human Genome and Stem-Cell Research Center (HUG-CELL) at the University of São Paulo’s Institute of Biosciences (IB-USP) and is posted to the Online Archive of Brazilian Mutations (ABraOM).
The researchers say this represents Latin America’s largest cohort of DNA of older people and was created to detect mutations responsible for genetic diseases in this population or that play a key role in healthy aging. An analysis of the genomic data is reported this month in Nature Communications.
The considerable genetic diversity of the study sample contrasts with the coverage of international genomic databases, which is predominantly European.
The researchers made several important findings. For example, some subjects carry genetic variants classified in European databases as pathogenic but do not manifest the diseases associated with these mutations.
“One of the hypotheses we’re raising to explain this is that genetic variants hitherto classified as pathogenic may be expressed differently according to whether the individual’s genetic background is European or of mixed ancestry, for example,” said Michael Naslavsky, a professor at IB-USP and first author of the article.
The dataset includes over 76 million variants, of which ~2 million are absent from large public databases, the researchers report. They added that “WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources.”
The subjects in this study had an average age of 71 and were unrelated. They were selected by researchers at the University of São Paulo’s School of Public Health (FSP-USP) from the cohort recruited for the longitudinal study of Health, Wellbeing and Aging (Saúde, Bem-estar e Envelhecimento, SABE).
SABE investigates the health and living conditions of people aged 60 and over in São Paulo and six other cities of Latin America and the Caribbean via interviews, assessments, and medical examinations. “The study is representative of the elderly population of São Paulo because it’s based on the municipal census of the city and includes people in all income groups,” said Mayana Zatz, a professor at IB-USP. Zatz is principal investigator at HUG-CELL.
Since a draft of the article was first posted as a preprint in 2020, many researchers in Brazil and abroad have accessed the data, according to the authors. Datasets for individuals are already available to the scientific community for downloading on request, and, as noted, aggregate data has been posted to ABraOM.
The subjects were selected for whole-genome sequencing because they had passed the age at which clinical manifestations of several aging-related diseases, such as Alzheimer’s and Parkinson’s, typically begin.
Analysis of the seniors’ genomes also enabled the researchers to detect novel variants of genes in the human leukocyte antigen (HLA) complex, which encodes proteins responsible for permitting the recognition of pathogens and regulating the immune system. These genes are known to be the most variable and diverse of all in the human genome and hence hard to analyze.
These researchers created reference imputation panels for the whole-genome and HLA alleles, which improved imputation accuracy, allowing them to locate more than 140 alleles from HLA genes never described before. Findings such as this are germane to studies on susceptibility or resistance to infection by a wide array of pathogens, including SARS-CoV-2.