Andrew Bray

I am interested in a wide variety of questions in statistics, applied and theoretical. Take a look at the list below, check out the accompanying papers, and let me know if any of them spark your interest.

Inference for Trees

The inferential properties of linear regression models are well understood and widely used in the sciences. While algorithmic models, such as regression trees and their extensions, are now widely used for prediction, their inferential capabilities are very underdeveloped. Can we compare the variable importances in a random forest model to the betas of a regression model? How can we calculate the equivalent of a p-value? See Leo Breiman's seminal seminal 2001 paper for a starting point.

WHIM Algorithm (Where It Matters)

A key step in a statistical analysis is the fitting of a statistical model to a particular data set. This usually amounts to maximizing a likelihood function or posterior density. Lavine et al (2015) propose a branch-and-bound style algorithm that explore the shape of these functions while ensuring that no global maxima are missed. There are many open problems related to this algorithm. How does the complexity scale in n and p? What can be learned from a WHIM analysis compared to a traditional MLE analysis? Ask me for a draft of Lavine 2016 for an overview.

Forensic Statistics

The use of fingerprinting as a method of peronsal identification is now well-estalished. Can we improve identification by explicitly modeling the print deformation that occurs when someone presses their finger onto a surface? This work is an extension of the 2015/2016 forensics workshop at Statistical and Applied Mathematical Studies Institute (SAMSI).

Spatial Models over Networks

Will Jones (Reed '15) constructed several models for the quality of bike routes across the city of Portland using data that arises from an app called Ride Report. His analysis used hierarchical logistic regression models as well as a missing data model to shed light on strong correlates of ride quality. The next step in the analysis is to explicitly consider the spatial aspect of the data by constructing a spatial model over the road network. See Will's github repo for details.

Cluster Detection for Sampled Point Process Data

Philip Stalworth (Reed '15) investigated how clusters can be detected in data that has been sampled via k-tree sampling. The data on hand relates to the location of pitcher plants across several bogs in New England. His thesis concludes with several ideas on how to weave together several techniques to accomplish this goal. See Philip's github repo for details.

Social Network Models for Valued Data

There is a rapidly growing body of research on statistical models for network data in which the ties between nodes is binary (either there is a tie or there is not). In many settings, however, the tie is valued; i.e. it can take on any value within an interval. Michael Weiss (Reed '15) studied the social behavior of Orcas in Puget Sound and noted the proportion of time that each whale spent in one another's presence. His analysis creates a hard threshold, then converts the valued data into binary data to allow the use of the well-developed binary models. There are many open questions into how these valued network models can be parameterized and estimated, along with the natural applciation of these models to Michael's Orca data. You can find a copy of Michael's thesis in the library or in my office.