Department of Biostatistics at Columbia University

Department of Biostatistics at Columbia University Biostatistics is the science of developing and applying statistical methods for quantitative studies in biomedicine, health, and population sciences.

As one of the nation’s premier centers of biostatistical research pertaining to clinical trials, brain imaging, cancer, mental health, and more, the Department of Biostatistics at Columbia’s Mailman School offers students a myriad of opportunities for advanced study. Faculty in the Department of Biostatistics work at the frontier of public health, leading research teams that investigate some of today’s most pressing health issues. Recruited from the top universities from around the world, the faculty bring to the School a wealth of experience that serves to inform their research and teaching.

This Thursday, October 9th, KC Gary Chan, PhD of the University of Washington School of Public Health will present a Lev...
10/07/2025

This Thursday, October 9th, KC Gary Chan, PhD of the University of Washington School of Public Health will present a Levin Lecture on “Robust and efficient semiparametric inference for the stepped wedge design” from 11:45am - 1:00pm over Zoom. You can find the link on the Fall 2025 Departmental Lectures page. All are welcome to come learn with us!

Abstract:
Stepped wedge designs (SWDs) are increasingly used to evaluate longitudinal cluster-level interventions but pose substantial challenges for valid inference. Because crossover times are randomized, intervention effects are intrinsically confounded with secular time trends, while heterogeneous cluster effects, complex correlation structures, baseline covariate imbalances, and unreliable standard errors from few clusters further complicate statistical inference. We propose a unified semiparametric framework for estimating possibly time-varying intervention effects in SWDs that directly addresses these issues. A nonstandard development of semiparametric efficiency theory is required to accommodate correlated observations within clusters, non-identically distributed outcomes across clusters due to varying cluster-period sizes, and weakly dependent treatment assignments that are hallmarks of SWDs. The resulting estimator of treatment contrast is consistent and asymptotically normal even under misspecification of the covariance structure and control cluster-period means, and achieves the semiparametric efficiency bound when both are correctly specified. To facilitate inference for trials with few clusters, we introduce a permutation-based procedure to better capture finite-sample variability and a leave-one-out correction to mitigate plug-in bias. We further discuss how effect modification can be naturally incorporated, and imbalanced precision variables can be accommodated via a simple adjustment closely related to post-stratification, a novel connection of independent interest. Simulations and application to a public health trial demonstrate the robustness and efficiency of the proposed method relative to standard approaches.

We are excited to invite our Biostatistics students, faculty, and staff to a Pumpkin Painting Event on Wednesday, Octobe...
10/01/2025

We are excited to invite our Biostatistics students, faculty, and staff to a Pumpkin Painting Event on Wednesday, October 8th, from 4:00 PM to 5:00 PM in the ARB 6th Floor Lobby. This event will be a fantastic opportunity to unleash your creativity, enjoy some seasonal fun, and connect with fellow students.

All materials will be provided, and no prior painting experience is necessary. Just bring your enthusiasm and a smile!

Please RSVP by October 3rd to ensure we have enough supplies for everyone via the link sent to your Columbia email.

We look forward to seeing you there and celebrating the autumn season together!

This Thursday, October 2nd, Natalie Dean, PhD  of Emory University Rollins School of Public Health will present a Levin ...
09/30/2025

This Thursday, October 2nd, Natalie Dean, PhD of Emory University Rollins School of Public Health will present a Levin Lecture on “Challenges in Estimating Vaccine Effectiveness Against Progression to Severe Disease” from 11:45am - 1:00pm over Zoom. You can find the link on the Fall 2025 Departmental Lectures page. All are welcome to come learn with us!

Abstract:
Vaccines can reduce an individual’s risk of infection and their risk of progression to disease given infection. The latter effect is less commonly estimated but is relevant for risk communication and vaccine impact modeling. Using a motivating example from the COVID-19 literature, we note how vaccine effectiveness against progression can appear to increase over time in settings where true biological strengthening is unlikely. We use mathematical modeling to demonstrate how this phenomenon can occur when there is an underlying vulnerable subpopulation with poor vaccine response against infection and progression. As a result, the earliest infections are among those with the weakest protection against disease. We describe a modeling framework to link underlying immunology and post-vaccination outcomes that we use to further examine this problem. This work highlights methodological challenges in isolating a vaccine’s effect on progression to severe disease after infection.

Next week is the start of a new month and that means more Departmental Levin Lectures! You can view all the abstracts, u...
09/26/2025

Next week is the start of a new month and that means more Departmental Levin Lectures!

You can view all the abstracts, upcoming lectures in Fall 2025, and find Zoom links on our Departmental Lectures webpage. We hope to see you there!

Thursday, October 2nd: Natalie Dean, PhD - “Challenges in Estimating Vaccine Effectiveness Against Progression to Severe Disease”

Thursday, October 9th: KC Gary Chan, PhD - “Robust and Efficient Semiparametric Inference for the Stepped Wedge Design”

Thursday, October 16th*: Andrew An Chen, PhD - “Methodological Considerations in Applying Brain Charts to New Samples” (*This lecture will be in-person only in ARB Hess Commons)

Thursday, October 23rd: GuanNan Wang, PhD - "Boosting Biomedical Imaging Analysis via Distributed Functional Regression and Synthetic Surrogates"

Thursday, October 30th*: Bibhas Chakraborty, PhD - "Innovative Trial Designs in Mobile Health Using Reinforcement Learning" (*This lecture will begin at 9am EST)

We are excited for the first FDAWG (Functional Data Analysis Working Group) Meeting today! Join us from 4 - 5pm on Zoom ...
09/23/2025

We are excited for the first FDAWG (Functional Data Analysis Working Group) Meeting today! Join us from 4 - 5pm on Zoom or in ARB room 627 to hear Dr. Johan Vagelius of Uppsala University give a talk titled, "Functional mixed models for time-dependent PET”

Abstract:
The simplified reference tissue model (SRTM) is widely used for PET receptor quantification but assumes constant kinetic parameters. Existing time-varying extensions often impose fixed response shapes or rely on voxelwise fits that forego hierarchical pooling. We propose a functional mixed-effects formulation that models the apparent efflux as a smooth function of time, Image, decomposed into group-level smooths (fixed functional effects) and subject-specific smooth deviations (random functions). Using a common time grid and a Gaussian-process kernel, we formulate the SRTM in a function-on-scalar mixed model that supports direct inference on group differences in Image with principled uncertainty quantification.
We place a Gaussian prior on the fixed-effects coefficients and integrate out the random functional effects to obtain a marginal likelihood. Conditioning on variance/smoothing hyperparameters, the posterior of the fixed effects is Gaussian. Computationally, inference requires inversion of only Image matrices, where Image and Image are the number of covariates and the time-grid size, respectively, rather than the full data covariance. This enables scalable estimation of time-dependent PET processes.

This Thursday, September 18th, Jingyi Jessica Li, PhD will present a Levin Lecture on “Nullstrap: A Simple, High-Power, ...
09/16/2025

This Thursday, September 18th, Jingyi Jessica Li, PhD will present a Levin Lecture on “Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models” from 11:45am - 1:00pm over Zoom. Find the zoom link via the Fall 2025 Departmental Lectures page on our website. All are welcome to come learn with us!

Abstract:
Balancing false discovery rate (FDR) control with high statistical power is a central challenge in high-dimensional variable selection. Existing methods often degrade data through knockoffs or splitting, leading to power loss. We propose Nullstrap, a framework that con- trols FDR without altering the original data. Nullstrap generates synthetic null data by fitting a null model under the null hypothesis and applies the same estimation to both original and synthetic datasets. This parallel structure resembles the likelihood ratio test, serving as its numerical analog. A data-driven correction procedure adjusts null estimates, enabling variable selection with theoretical guarantees: asymptotic FDR control at any desired level and power converging to one. Nullstrap is fast, stable, and broadly applicable across linear, generalized linear, Cox, and graphical models. Simulations indicate that Nullstrap maintains robust FDR control and outperforms the knockoff filter and data splitting in power (0.95 vs. 0.50 and 0.70) and efficiency (≈ 30×). While all three methods are randomized, Nullstrap is more stable (Jaccard 0.98 vs. 0 and 0.42). In a triple-omics time-to-labor dataset, the knockoff filter and data splitting fail to identify variables in most of 70 runs with different random seeds, whereas Nullstrap consistently selects predictors, achieves > 90% predictive accuracy, and is three orders of magnitude faster.

This Thursday, September 11th, Bingxin Zhao, PhD will present a Levin Lecture on “Resampling-based Pseudo-training in Ge...
09/09/2025

This Thursday, September 11th, Bingxin Zhao, PhD will present a Levin Lecture on “Resampling-based Pseudo-training in Genomic Predictions” from 11:45am - 1:00pm in the ARB 8th Floor Auditorium. This week, the lecture is in-person only. All are welcome to come learn with us!

Abstract:
In this talk, I will present a resampling-based pseudo-training framework for genomic prediction that enables model development using only summary-level data. We show that generating pseudo-training and validation statistics from summary results achieves asymptotic equivalence to conventional training while avoiding the need for individual-level datasets. Simulations and real data applications suggest that pseudo-training performs comparably to standard approaches with large datasets and substantially better when tuning data are limited. We highlight two platforms built on this framework: PennPRS (https://pennprs.org/), a cloud-based computing infrastructure supporting large-scale, no-code polygenic risk score training with purely summary data resources, and GCB-Hub (https://www.gcbhub.org/), which applies pseudo-training to proteome-wide association studies for protein-disease mapping and drug discovery. Together, these advances demonstrate how resampling-based pseudo-training methods can broaden accessibility, scalability, and impact of genomic prediction across diverse biomedical research settings.

Happy start of the fall semester! We’re celebrating with the first T-Time this Wednesday, September 11th, from 4-5pm. Co...
09/08/2025

Happy start of the fall semester! We’re celebrating with the first T-Time this Wednesday, September 11th, from 4-5pm.

Come hang out in the department and meet your fellow classmates & faculty! Whether you're a returning student or brand new to campus, this if your chance to connect and set the tone for an amazing semester ahead!

All Biostatistics Faculty, staff, and students are welcome and encouraged to come.

We were excited to welcome our newest students for the first week of classes of Fall 2025! Here's to a great semester!
09/05/2025

We were excited to welcome our newest students for the first week of classes of Fall 2025! Here's to a great semester!

Today, Thursday, September 4th, James Zou, PhD will present a Levin Lecture on “Computational Biology in the Age of AI A...
09/04/2025

Today, Thursday, September 4th, James Zou, PhD will present a Levin Lecture on “Computational Biology in the Age of AI Agents” from 11:45am - 1:00pm over Zoom. Find the zoom link via the Fall 2025 Departmental Lectures page on our website. All are welcome to come learn with us!

Abstract:
AI agents—large language models equipped with tools and reasoning capabilities—are emerging as powerful research enablers. This talk will explore how computational biology is particularly well-positioned to benefit from rapid advances in agentic AI. I’ll first introduce the Virtual Lab—a collaborative team of AI scientist agents conducting in silico research meetings to tackle open-ended research projects. As an example application, the Virtual Lab designed new nanobody binders to recent Covid variants that we experimentally validated. Then I will present CellVoyager, a data science agent that analyzes complex genomics data to derive new insights. I will conclude by discussing limits of agents and a roadmap for human researcher-AI collaboration.

It's the start of a new school year and that means our Departmental Levin Lecture series starts this week! This year, ma...
09/03/2025

It's the start of a new school year and that means our Departmental Levin Lecture series starts this week! This year, many of our lectures will be available via zoom.

You can check our website to view all the upcoming lectures in Fall 2025 and find Zoom links. We hope to see you there!

Thursday, September 4th: James Zou, PhD - “Computational Biology in the Age of AI Agents”

Thursday, September 4th: Bingxin Zhao, PhD - “Resampling-based Pseudo-training in Genomic Predictions”

Thursday, September 18th: Jingyi Jessica Li, PhD - “Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models”

Thursday, September 25th: Brian Caffo, PhD - Talk Title TBA

UPDATE: Location moved to Hammer 322!Friday, June 13th, at 12pm in Hammer 322, Margaret Gacheru will give a public prese...
06/13/2025

UPDATE: Location moved to Hammer 322!

Friday, June 13th, at 12pm in Hammer 322, Margaret Gacheru will give a public presentation of her dissertation research titled, "Multimodal Data Analysis using Latent Variables with Applications in Psychiatry and Neuroscience". Please join us to see what excellent work she has been doing!

Address

722 W 168th Street
New York, NY
10032

Alerts

Be the first to know and let us send you an email when Department of Biostatistics at Columbia University posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Share

Share on Facebook Share on Twitter Share on LinkedIn
Share on Pinterest Share on Reddit Share via Email
Share on WhatsApp Share on Instagram Share on Telegram