Projects in Data Complexity - Soul Searching for Grad School

Explore projects in Data Complexity!

Projects in data complexity focus on developing and applying advanced methodological approaches to manage and interpret complex data structures commonly encountered in contemporary research. This includes working with high-dimensional data, nested or hierarchical designs, longitudinal and time-series data, missing or incomplete data, and data from multiple sources or modalities. Researchers in this area explore innovations in statistical modeling, machine learning, data integration, and visualization to enhance analytic precision and interpretability. Projects may also address challenges such as nonlinearity, measurement error, and dynamic processes over time. By advancing tools that can accommodate the richness and intricacy of real-world data, these projects support more robust, nuanced, and actionable research findings across disciplines.

“Complex polygon 2-4-5-bipartite graph” by Tomruen is licensed under Creative Commons Attribution-Share Alike 4.0 International via Wikimedia Commons

Learning Decision Rules in Shifting Environments
Project Abstract/Summary This research project will develop new methods and software for data-driven decision making under environmental shift. Most work in data-driven decision making fundamentally relies on the stability of the statistical environment. Available tools for data-driven decision making generally assume that future environments in which decision rules will be deployed resemble the past environment where data was collected. Many real-world applications, however, display significant environmental shift due to various factors. The newly developed methods will expand researchers’ ability to apply data-driven decision making to systems with environmental shift. Possible application areas range from medical settings and social programs to… Read more: Learning Decision Rules in Shifting Environments
The Item Bank Calibration and Replenishment for Computerized Adaptive Testing in Small Scale Assessments: Method, Theory, and Application
Project Abstract/Summary This research project will advance statistical estimation methods for computerized adaptive testing (CAT) item bank calibration and replenishment in small-scale assessments. CAT has emerged as a powerful assessment tool and has been applied to the field of educational testing, quality of life measurement, health related measurement, and testing in industrial settings. Different from the traditional paper-pencil test, CAT allows for personalized assessment. However, the application of CAT remains limited in small-scale assessment scenarios, such as in classrooms or business daily routines. This project will develop a series of statistical estimation methods, theories, algorithms, and software aimed at accelerating… Read more: The Item Bank Calibration and Replenishment for Computerized Adaptive Testing in Small Scale Assessments: Method, Theory, and Application
CAREER: Flexible Record Linkage through Realistic Modeling of Dependent, Missing, and Updating Data
Project Abstract/Summary This CAREER research project will develop flexible Bayesian record linkage models to enable more accurate linking of data sets without unique identifiers. Unleashing the potential of increasingly ubiquitous data to solve grand problems from conservation biology to demography and population estimation often requires linking multiple data sources. Record linkage is the process of resolving duplicates in partially overlapping sets of records from noisy data sources without a unique identifier. A statistical model-based approach is attractive in that it allows for uncertainty quantification in the linkage. Current approaches to record linkage, however, struggle to account for data with dependence,… Read more: CAREER: Flexible Record Linkage through Realistic Modeling of Dependent, Missing, and Updating Data
HNDS-R: Improving Data Integration Techniques
Project Abstract/Summary This research project will develop methodologies to address the critical challenges researchers face when merging datasets without unique identifiers. As datasets have become more abundant and diverse, researchers are seeking ways to combine data from multiple sources to tackle important societal questions. A frequent obstacle to linking diverse datasets is the lack of a unique identifier, such as a social security number. This leads to uncertainty in linking records across datasets and computational complexity due to the need for multiple comparisons without prior knowledge of the correspondence between records. This project will develop easy-to-use, computationally efficient, and accurate… Read more: HNDS-R: Improving Data Integration Techniques
Adaptive Dependent Data Models via Graph-Informed Shrinkage and Sparsity
Project Abstract/Summary This research project will advance statistical modeling and computing strategies for dependent data. Dependent data are widespread and important. Many socioeconomic, cultural, and political data are measured on spatial areal units, most economic data are time-ordered and co-dependent, and modern monitoring systems record social, environmental, and economic exposure data at near-continuous resolutions. However, this growing abundance of dependent data has outpaced the development of statistical methods and algorithms for such data. The project will develop new statistical tools that will allow researchers to extract reliable information and make decisions from such dependent data. The methods to be developed… Read more: Adaptive Dependent Data Models via Graph-Informed Shrinkage and Sparsity