Adaptive Dependent Data Models via Graph-Informed Shrinkage and Sparsity

Project Abstract/Summary

This research project will advance statistical modeling and computing strategies for dependent data. Dependent data are widespread and important. Many socioeconomic, cultural, and political data are measured on spatial areal units, most economic data are time-ordered and co-dependent, and modern monitoring systems record social, environmental, and economic exposure data at near-continuous resolutions. However, this growing abundance of dependent data has outpaced the development of statistical methods and algorithms for such data. The project will develop new statistical tools that will allow researchers to extract reliable information and make decisions from such dependent data. The methods to be developed will be motivated by specific, timely, and important problems in the following areas: local elections and redistricting; inflation modeling and forecasting; spatial pattern extraction for economic, health, and urban data; and modeling of monitoring and exposure data. The project will provide training and mentoring for undergraduate and graduate students, develop publicly available software and visualization tools, and showcase local, state, and federal government data.

This research project will develop new statistical tools to adequately capture a broad array of data dependencies, provide computational scalability for massive datasets, and leverage the dependence structures for more adaptive and localized estimation, uncertainty quantification, and imputation of missing data. Unmodeled dependence renders inferences suboptimal or invalid, resulting in underpowered analyses and erroneous conclusions. In addition, dependent data are often high-dimensional with substantial missingness, leading to significant computational and statistical challenges. Within a Bayesian framework, the project will simultaneously integrate the dependence in (i) the model for the signal to provide smoothness and regularization, (ii) the accompanying shrinkage or sparsity prior for enhanced local adaptivity, and (iii) the computational and numerical strategies for scalable posterior inference. Dependence will be encoded as a graph that links together observational units, such as consecutive observations for time-ordered or functional data, adjacent pixels for image or lattice data, and neighboring areal units for spatial data, among many other examples. This graph-based formulation will lay the foundation to unify and advance a broad collection of models, shrinkage and sparsity priors, and inference algorithms for dependent data. The tools to be developed will be customized for a variety of settings, including trend estimation and imputation, shrinkage or sparsity priors, graph-informed regression analysis, factor models, and discrete data, among others.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.

Principal Investigator

Daniel Kowal – Cornell University located in ITHACA, NY

Co-Principal Investigators

Funders

National Science Foundation

Funding Amount

$226,874.00

Project Start Date

07/01/2024

Project End Date

07/31/2026

Will the project remain active for the next two years?

The project has more than two years remaining

Source: National Science Foundation

Please be advised that recent changes in federal funding schemes may have impacted the project’s scope and status.

Updated: April, 2025