I am interested in a wide variety of questions in statistics, applied and theoretical. Take a look at the list below, check out the accompanying papers, and let me know if any of them spark your interest.
The inferential properties of linear regression models are well understood and widely used in the sciences. While algorithmic models, such as regression trees and their extensions, are now widely used for prediction, their inferential capabilities are very underdeveloped. Can we compare the variable importances in a random forest model to the betas of a regression model? How can we calculate the equivalent of a p-value? See Leo Breiman's seminal seminal 2001 paper for a starting point.
A key step in a statistical analysis is the fitting of a statistical model to a particular data set. This usually amounts to maximizing a likelihood function or posterior density. Lavine et al (2015) propose a branch-and-bound style algorithm that explore the shape of these functions while ensuring that no global maxima are missed. There are many open problems related to this algorithm. How does the complexity scale in n and p? What can be learned from a WHIM analysis compared to a traditional MLE analysis? Ask me for a draft of Lavine 2016 for an overview.
The use of fingerprinting as a method of peronsal identification is now well-estalished. Can we improve identification by explicitly modeling the print deformation that occurs when someone presses their finger onto a surface? This work is an extension of the 2015/2016 forensics workshop at Statistical and Applied Mathematical Studies Institute (SAMSI).
Will Jones (Reed '15) constructed several models for the quality of bike routes across the city of Portland using data that arises from an app called Ride Report. His analysis used hierarchical logistic regression models as well as a missing data model to shed light on strong correlates of ride quality. The next step in the analysis is to explicitly consider the spatial aspect of the data by constructing a spatial model over the road network. See Will's github repo for details.
Philip Stalworth (Reed '15) investigated how clusters can be detected in data that has been sampled via k-tree sampling. The data on hand relates to the location of pitcher plants across several bogs in New England. His thesis concludes with several ideas on how to weave together several techniques to accomplish this goal. See Philip's github repo for details.