The National Institutes of Health (NIH)’s All of Us Research Program has released data on nearly a quarter million whole genome sequences for broad research use. That means that to date, the group has information from more than 413,450 participants. In addition to the whole genome sequences, this includes data from surveys, electronic health records, physical measurements, and Fitbit devices.
The researchers report that about 45 percent of the data was donated by people who self-identify with a racial or ethnic group that has been historically underrepresented in medical research.
All of Us opened for enrollment in May 2018 and includes people 18 and older from more than 340 sites across the U.S. Its goal is to “build a large research cohort of one million or more Americans that will provide the platform for expanding our knowledge of precision medicine approaches and that will benefit the nation for many years to come,” according to its initial planning report.
“Now, through a partnership with participants, researchers, and diverse communities across the country, we are seeing incredible progress towards powering scientific discoveries that can lead to a healthier future for all of us,”said Josh Denny, MD, MS, chief executive officer of the program.
The group released its first genomic data set, nearly 100,000 whole genome sequences, in March 2022. At the time, Denny said “Until now, over 90% of participants from large genomics studies have been of European descent. The lack of diversity in research has hindered scientific discovery.”
Since then, the group has returned results to its first set of participants, released important findings about COVID-19, and more.
The team describes this as the “world’s largest and most diverse dataset of its kind paving the way to help advance health equity and uncover health care approaches better tailored to people’s genes, lifestyles, and environments.”
Other advances have occurred. For example, Fitbit device data will now include information on sleep, in addition to activity, step count and heart rate. Sleep data, when used alongside participants’ electronic health record data, could be useful for studying how sleep patterns affect overall health and disease progression, including for conditions such as heart disease, high blood pressure, diabetes, depression and dementia.
Using the program’s cloud-based platform, the Researcher Workbench, registered researchers can use these data to study genetic variation and other issues.
The data, for example, can be used to develop personalized health-related DNA results for participants. So far, the group reports that so far more than 25,000 participants have requested to receive one or more of these reports detailing whether they have an increased risk for specific health conditions or potential for reactions to certain medications.
More than 5,000 researchers have registered to use the Researcher Workbench. Available data types include:
- More than 413,350 survey responses,
- More than 337,500 physical measurements,
- More than 312,900 genotyping arrays,
- More than 287,000 electronic health records (EHRs),
- More than 245,350 whole genome sequences,
- More than 15,600 Fitbit records, and
- More than 1,000 long-read whole genome sequences.