Department of Biostatistics at Columbia University

Department of Biostatistics at Columbia University Biostatistics is the science of developing and applying statistical methods for quantitative studies in biomedicine, health, and population sciences.

As one of the nation’s premier centers of biostatistical research pertaining to clinical trials, brain imaging, cancer, mental health, and more, the Department of Biostatistics at Columbia’s Mailman School offers students a myriad of opportunities for advanced study. Faculty in the Department of Biostatistics work at the frontier of public health, leading research teams that investigate some of to

day’s most pressing health issues. Recruited from the top universities from around the world, the faculty bring to the School a wealth of experience that serves to inform their research and teaching.

UPDATE: Location moved to Hammer 322!Friday, June 13th, at 12pm in Hammer 322, Margaret Gacheru will give a public prese...
06/13/2025

UPDATE: Location moved to Hammer 322!

Friday, June 13th, at 12pm in Hammer 322, Margaret Gacheru will give a public presentation of her dissertation research titled, "Multimodal Data Analysis using Latent Variables with Applications in Psychiatry and Neuroscience". Please join us to see what excellent work she has been doing!

Congratulations to all the award winners of the 2025 Biostatistics Department Awards! We’re so proud of the outstanding ...
05/29/2025

Congratulations to all the award winners of the 2025 Biostatistics Department Awards! We’re so proud of the outstanding work you’ve done.

Chair’s Award for Outstanding Master’s Student: Hongyi Chen, MPH '25, Zhuodiao Kuang, MS '25 & Xinyi Shang, MS '25
Sanford Bolton-John Fertig Award: Melanie Mayer PhD '24
Joseph L Fleiss Award: Wenbo Fei, PhD ‘25
Teaching Assistant Award: Ryan Wei & Qin Huang, MS '25

Swipe for more information on the individual awardees

This Thursday, May 8th at 11;45am in the ARB 8th Floor Auditorium, is our last Levin Lecture of the semester! Dr. Yingyi...
05/07/2025

This Thursday, May 8th at 11;45am in the ARB 8th Floor Auditorium, is our last Levin Lecture of the semester! Dr. Yingying Wei of the The Chinese University of Hong Kong will join us to present her research, "Meta-clustering of Gene Expression Data". We hope you'll join us!

Abstract:
Traditional meta-analyses pool effect sizes across studies to improve statistical power. Likewise, there is growing interest in joint clustering across datasets to identify disease subtypes for bulk gene expression data and to discover cell types for single-cell RNA-sequencing (scRNA-seq) data. Unfortunately, due to the prevalence of technical batch effects, directly clustering of samples from multiple gene expression datasets can lead to wrong results. Therefore, in the past several years, there has been very active research on the integration of multiple gene expression datasets. However, the discussion on when multiple gene expression datasets can be integrated for joint clustering is lacking. Obviously, if different subtypes are assayed in distinct batches, then meta-clustering would be impossible no matter what types of machine learning or statistical methods are used.

In this talk, I will present our Batch-effects-correction-with-Unknown-Subtypes (BUS) framework. BUS is capable of adjusting batch effects explicitly, grouping samples that share similar characteristics into subtypes, identifying genes that distinguish subtypes and enjoying a linear-order computational complexity. The BUS framework can be adapted to perform meta-clustering for bulk gene expression data, scRNA-seq data collected from a single biological condition, and scRNA-seq data collected from multiple biological conditions, respectively. The proofs for model identifiability for the corresponding models provide insights on when multiple gene expression data can be integrated for meta-clustering and guidelines on experimental designs. Simulation studies and real data analyses show the advantages of our proposed models over state-of-the-art methods, especially when performing differential inference for scRNA-seq data collected from multiple conditions.

Last Friday our 2025 graduating MS & MPH students presented their Practicum/APEx projects to fellow students and faculty...
05/06/2025

Last Friday our 2025 graduating MS & MPH students presented their Practicum/APEx projects to fellow students and faculty. It was a day filled with learning that showcased their impressive skills and the difference they are already making in the real world. Congratulations Class of 2025! Next up - the Biostatistics Department Graduation Celebration this Friday!

Congratulations to the Biostatistics Class of 2025! On Friday, May 9th from 3pm - 5pm in the Haven Ballroom we will cele...
05/05/2025

Congratulations to the Biostatistics Class of 2025! On Friday, May 9th from 3pm - 5pm in the Haven Ballroom we will celebrate your hard work, resilience, and all you've accomplished. Join us for food, fun, and one last hurrah with your department! Here's to the next chapter and the impact you'll make in public health and beyond!

This Thursday, May 1st, Dr. Qing Pan of the George Washington University, Milken Institute School of Public Health will ...
04/30/2025

This Thursday, May 1st, Dr. Qing Pan of the George Washington University, Milken Institute School of Public Health will join us to give a Levin Lecture titled, "Predictions of Advanced Adenoma and High-Risk Pregnancies in Longitudinal Screening Studies". We hope you'll join us at 11:45am in the ARB Hess Commons to learn about her great research!

Abstract:
Panel count data is common in cancer screening. In the context of colorectal cancer screening, our work focuses on the prediction of the probability of advanced adenoma conditional on patient-level risk factors and/or event history. We implement the joint frailty model proposed by Huang et al. (2006), which involves a non‐stationary Poisson process for recurrent adenoma events and informative screening time using semi‐parametric Cox models correlated by a latent frailty variable. Coefficients and baseline intensity functions are estimated through estimating equations. The subject-specific frailty value is estimated by the borrow‐strength method (Huang and Wang 2004). In addition, marginal models for the adenoma and screening events are also applicable when average covariate effects on the population level are of interest. Predictions of individual risks based on the marginal model and predictions based on the frailty models for patients with or without screening history are compared. When a patient’s screening history is available and sufficient adenoma events are observed, the predictions based on the frailty model with estimated subject‐specific frailty are superior. However, in the cases of early censoring when adenoma events are not observed for most patients or screening history is not available, the prediction based on the marginal model has better performance. For future screening, the individualized screening intervals based on the dynamic predictions of advanced adenoma risks will detect adenomas earlier with shorter lag times between adenoma occurrences compared to the current practice of fixed screening intervals for all.

In a separate project, machine learning and deep learning models to identify pregnancies with elevated risks of adverse outcomes are compared. A novel GRU model that accommodates both static and time-varying information and allows interactions between these two kinds of covariates through additional attention layers provides better performance. Contributions of various types of covariates (questionnaires, blood tests, and ultrasound) to the prediction accuracy are compared for clinical practice in low- and middle-income countries.

Friday, May 2nd from 12pm - 5pm in the Hammer Building is our 2025 Practicum/APEx Symposium!Join us to learn from the Cl...
04/28/2025

Friday, May 2nd from 12pm - 5pm in the Hammer Building is our 2025 Practicum/APEx Symposium!

Join us to learn from the Class of 2025 about the culmination of their academic pursuits at Columbia. They will present their real-world projects created in partnership with public health organizations and research initiatives that showcase the breadth and depth of Biostatistics in action.

From age-related changes in brain dynamic function to salary transparency and the gender divide in NYC, you can view the full titles and abstracts at the link below.

All CUIMC and Mailman School students, faculty, staff, and affiliates are welcome to this celebration of the Class of 2025!

https://www.publichealth.columbia.edu/academics/departments/biostatistics/news-events/2025-biostatistics-practicum-apex-symposium

To close out the semester we have two more Levin Lectures! Join us on Thursday May 1st & 8th to cap off the academic yea...
04/28/2025

To close out the semester we have two more Levin Lectures! Join us on Thursday May 1st & 8th to cap off the academic year with learning.

Thursday, May 1st: Qing Pan, PhD - "Predictions of Advanced Adenoma and High-Risk Pregnancies in Longitudinal Screening Studies"

Thursday, May 8th: Yinying Wei, PhD - "Meta-clustering of Gene Expression Data"

On Tuesday (4/29) 4-5pm, the Functional Data Analysis Working Group is excited to host Dr. Ofer Harel from the Universit...
04/27/2025

On Tuesday (4/29) 4-5pm, the Functional Data Analysis Working Group is excited to host Dr. Ofer Harel from the University of Connecticut for a talk titled “A two-stage classification for dealing with unseen clusters in the testing data.” Join us in ARB 627 or via zoom!

Abstract:
Classification is an important statistical tool that has increased its importance since the emergence of the data science revolution. However, a training data set that does not capture all underlying population subgroups (or clusters) will result in biased estimates or misclassification. In this presentation, we introduce a statistical and computational solution to a possible bias in classification when implemented on estimated population clusters. An unseen-cluster problem denotes the case in which the training data does not contain all underlying clusters in the population. Such a scenario may occur due to various reasons, such as sampling errors, selection bias, or emerging and disappearing population clusters. Once an unseen-cluster problem occurs, a testing observation will be misclassified because a classification rule based on the sample cannot capture a cluster not observed in the training data (sample). To overcome such issues, we suggest a two-stage classification method to ameliorate the unseen-cluster problem in classification. We suggest a test to identify the unseen-cluster problem and demonstrate the performance of the two-stage tailored classifier using simulations and a public data example. (This is a joint work with Jung Wun Lee).

This Friday, April 18th, Minghe Wang's series on "Understanding and Addressing Covariate Shift in Transfer Learning" con...
04/24/2025

This Friday, April 18th, Minghe Wang's series on "Understanding and Addressing Covariate Shift in Transfer Learning" continues with Part 2 for the TRAIL4Health Brown Bag Learning Series. You can join in at ARB 627 & over Zoom from 12pm - 1pm.

Abstract:
Building on the foundational concepts of covariate shift problem introduced in the first lecture, this session will explore advanced covariate shift adjustment methods that estimate importance weights as a whole rather than separately. We will cover Kernel Mean Matching, Kullback-Leibler Divergence Minimization, and Least-squares-based techniques. These methods offer improved stability and efficiency in high-dimensional settings, making them especially relevant for modern healthcare datasets. The lecture will include hands-on examples to illustrate how these approaches can be implemented and evaluated in practice. By the end of this session, participants will be more familiar with covariant shift problem and the application of importance weighting methods.

Tomorrow, Thursday, April 24th, Dr.Amita Manatunga will present a Levin Lecture on "Model-free Framework for Evaluating ...
04/23/2025

Tomorrow, Thursday, April 24th, Dr.Amita Manatunga will present a Levin Lecture on "Model-free Framework for Evaluating the Reliability of a New Device with Multiple Imperfect Reference Standards" from 11:45am - 1:00pm in ARB 8th Floor Auditorium. All are welcome to come learn with us!

Abstract: A common practice for establishing the reliability of a new computer-aided diagnostic (CAD) device is to evaluate how well its clinical measurements agree with those of a gold standard test. However, in many clinical studies, a gold standard is unavailable, and one needs to aggregate information from multiple imperfect reference standards for evaluation. A key challenge here is the heterogeneity in diagnostic accuracy across different reference standards, which may lead to biased evaluation of a device if improperly accounted for during the aggregation process. We propose an intuitive and easy-to-use statistical framework for evaluation of a device by assessing agreement between its measurements and the weighted sum of measurements from multiple imperfect reference standards, where weights representing relative reliability of each reference standard are determined by a model-free, unsupervised inductive procedure. Specifically, the inductive procedure recursively assigns higher weights to reference standards whose assessments are more consistent with each other and form a majority opinion, while assigning lower weights to those with greater discrepancies. Unlike existing methods, our approach does not require any modeling assumptions or external data to quantify heterogeneous accuracy levels of reference standards. It only requires specifying an appropriate agreement index used for weight assignment and device evaluation. The framework is applied to evaluate a CAD device for kidney obstruction by comparing its diagnostic ratings with those of multiple nuclear medicine physicians.

This is joint work with Ying Cui, Qi Yu and Jeong H. Jang.

This Friday, April 18th, is another chance to join the TRAIL4Health Brown Bag Learning Series! Our own Biostatistics Mas...
04/17/2025

This Friday, April 18th, is another chance to join the TRAIL4Health Brown Bag Learning Series! Our own Biostatistics Master's student, Minghe Wang, will present a talk on, "Understanding and Addressing Covariate Shift in Transfer Learning" in ARB 627 & over Zoom from 12pm - 1pm.

Abstract:
Covariate shift refers to systematic differences in the distributions of input covariates between training (source) and deployment (target) settings. This presents a significant challenge when combining data from multiple sources or deploying pre-trained models to real-world healthcare applications. In the first lecture of this tutorial, we will introduce key concepts related to transfer learning and domain adaptation, followed by simple solutions for correcting covariate shifts. We will explore techniques for estimating density ratios, including kernel density estimation and histogram-based methods. Real-world examples using the MIMIC-III dataset will demonstrate how covariate shift impacts model performance. The subsequent lectures will build on this introduction with hands-on labs that cover more advanced reweighting approaches, such as kernel mean matching, discriminative learning, and techniques that go beyond simple reweighting. This series will consist of approximately four lectures.

Address

722 W 168th Street
New York, NY
10032

Alerts

Be the first to know and let us send you an email when Department of Biostatistics at Columbia University posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Share