DescriptionNatural features are often continuous, but many models of human learning and categorization involve discrete-valued (e.g. Boolean) features. Discretization is well-known to be beneficial in machine learning, leading to faster and sometimes more accurate learning. Yet there has been little research on how human learners discretize continuous features. This dissertation investigates human discretization, focusing on two specific areas of inquiry. First is the hypothesis that discretization of a continuous parameter depends on the shape of the probability distribution underlying it, and principally on the presence of “modes” or separable peaks in the distribution. The second hypothesis is that humans create clear distinctions between discretized feature values, rather than probabilistic boundaries.
Subjects were presented with items that had feature values drawn from a mixture of Gaussian distributions, and a free sorting task was used to assess whether subjects spontaneously discretized the feature in a way that related to the underlying mixture. The relative locations of the two component Gaussians, their separation as measured by Cohen’s d (the ratio of the distance between the components’ means to their standard deviations), and the number of items drawn from the overall mixture were varied. Each of these factors influenced the way subjects discretized the features, while further analysis showed that the estimated mixtures were more sharply separated (higher Cohen’s d) than the original probability. This study suggests that human featural discretization involves a process akin to the estimation of mixture components in the environment, but that the separation among the components is systematically overestimated to create “cleaner” divisions than are truly present—a phenomenon that might be termed hyperdiscretization.