Exploring the Gap between Informal Mental and Formal Statistical Models

Pfister H, Wattenberg M, Beyer J, and Nobre C.

Harvard Data Science Review, 2021.

Hullman and Gelman argue for unifying visual exploratory data analysis (EDA) and confirmatory data analysis (CDA), with the idea that a synthesis of these two perspectives can lead people to more robust, reliable conclusions. Although EDA is sometimes viewed as a ‘model-free’ activity, Hullman and Gelman suggest we need a better understanding of the role that models play in this process. From a descriptive perspective, it seems likely that people do have a prior mental model as they approach a data set; from a normative point of view, there may be better and worse ways of using these implicit models. As a first step, they point to a need for a theory of graphical inference during EDA rooted in Bayesian inference. We find their arguments compelling and believe that developing a new theory for EDA provides exciting avenues for future research. One significant challenge is that the informal mental models that people use during EDA may not easily translate to formal statistical models. In what follows, we discuss some of the differences between the two types of models, how studying these differences could be fruitful, and how the resulting theory might affect future visual analytics tools.