data analytics
The new initiative, led by Debbie Cheng, targets the growing scale and complexity of health data, removing systemic barriers to data discovery and cross-disciplinary collaboration.
This story originally published on the Center for Health Data Science’s web page.
The health sector is generating unprecedented volumes of digital data—from clinical records to genomics and population health. But more data has not meant easier access for researchers. Fragmented systems, regulatory complexity, and technical barriers continue to slow progress.
To address these challenges, the School of Public Health’s Center for Health Data Science (CHDS) has launched DataHub, a new initiative designed to transform how researchers discover, access, and work with health data.
DataHub acts as both a gateway and a guide. It connects researchers to high-value datasets, leveraging the center’s rich data inventory and governance expertise to facilitate responsible and effective access and use. It also enables AI-enabled research by helping ensure that underlying data are curated, organized in ways that allow them to work across systems and disciplines, and traceable.
“Researchers are often working with complex data, but the systems for discovering, accessing, and connecting those data have not evolved nearly as quickly,” says Debbie Cheng, executive director of CHDS and professor of biostatistics. “DataHub is designed to help lower those barriers and make it easier for researchers to work with data in rigorous, collaborative, and responsible ways.”
That foundation includes detailed metadata, standardized curation practices, and clear documentation of data provenance, a verifiable record of where data comes from and how it has been managed. Together, these elements help ensure that AI models and the insights they produce are grounded in reliable, well-understood data.
Reducing Barriers to Discovery
While DataHub can support emerging AI and data science applications, its impact extends across health research more broadly. “Researchers often spend enormous amounts of time trying to identify, access, and understand complex datasets,” says Cheng. “DataHub is designed to make that process easier, more transparent, and more reliable so researchers can focus on the science.”
By combining a curated catalog of impactful datasets and resources, with built-in guidance on data use agreements, regulatory requirements, and ethical considerations, DataHub reduces uncertainty and shortens the time needed to begin new projects.
DataHub draws on the expertise of SPH’s Biostatistics and Epidemiology Data Analytics Center (BEDAC), which has supported academic, government, and industry research since 1984. Over four decades BEDAC has developed deep expertise and capabilities in data management, statistical analysis, and secure computing for thousands of studies. DataHub translates institutional knowledge into an accessible, scalable infrastructure that reduces the technical and administrative burdens that slow research and expands access to BEDAC’s rigorous, hands-on support.
Enabling Convergence Across Disciplines
DataHub also supports a broader push toward convergence research at Boston University, bringing together disciplines such as medicine, public health, engineering, and data science to tackle complex health challenges. This work depends on connecting data and methods across systems shaped by different standards, technologies, and regulatory constraints. By lowering these barriers, DataHub makes it easier to study how multiple factors intersect, from neighborhood conditions and environmental exposures to long-term health outcomes and healthcare use.
Privacy, security, and compliance are core to DataHub. For supported environments such as the All of Us Research Program, appropriate access controls and regulatory pathways are already in place. For other data sources, DataHub helps researchers understand and navigate the requirements for access and responsible use. DataHub also assists teams with preparing Institutional Review Board (IRB) and data access documentation, advises on technical controls and de-identification strategies, and connects investigators with institutional resources that support secure and responsible data use.
“Many of today’s most important health challenges require researchers to work across disciplines and data systems that have historically remained separate,” says Yannis Paschalidis, Director of the Hariri Institute for Computing. “DataHub can connect data, expertise, and researchers in ways that support more collaborative and convergent research.”
For the Center for Health Data Science, the initiative reflects a broader mission. “Ultimately, our goal is to help researchers use complex data to answer important questions and generate insights that can improve health,” says Cheng. “We hope DataHub will enable new discoveries and help translate data into meaningful public health impact.”
Collaborating with DataHub
DataHub is expanding its efforts across Boston University to help researchers more easily discover, connect, and responsibly use complex data for convergent health research. CHDS is collaborating with developers and stewards of large, high‑value datasets to build a shared, well‑documented, and interoperable data resource ecosystem that supports cross‑disciplinary research. By improving data discoverability and aligning metadata standards, DataHub helps investigators more readily connect data across domains such as health, environment, and social systems. This initiative also supports NSF and NIH data management and sharing requirements by providing guidance on documentation, curation, compliance, and long‑term stewardship.
Investigators interested in sharing datasets, exploring collaborations, or learning more about DataHub are encouraged to contact CHDS at chdatascience@bu.edu. DataHub also offers consultation, training, and collaborative support alongside its technical infrastructure.
About the Center for Health Data Science
The School of Public Health established CHDS in 2024 to advance interdisciplinary health data science research, training, and practice to improve population health. The center brings together longstanding interdisciplinary expertise in biostatistics, epidemiology, environmental health, data science, and related fields across Boston University. CHDS supports collaborative research, education, training, and the development of data-driven approaches to address complex public health and biomedical challenges.
















































































































































































































































































