TY - JOUR TI - Fixation selection for categorical target searches in real world scenes DO - https://doi.org/doi:10.7282/t3-pw4n-2t53 PY - 2019 AB - Computational models have seen widespread success in predicting fixation locations for visual search tasks that use artificial stimuli, such as Gabors in 1/f noise (Najemnik & Geisler, 2005), but comparatively little in predicting fixation locations for visual search tasks with natural stimuli. Critically, previous approaches have not accounted for the effects of our foveated visual system nor implemented decision rules to actually select a sequence of fixations (Torralba, Oliva, Castelhano, & Henderson, 2006; Ehinger, Hidalgo-Sotelo, Torralba, & Oliva, 2009). Here we present a Bayesian model of fixation selection in visual search tasks using natural images. The model used two known sources of information to select fixations: scene context and features that look similar to the target (termed target-relevant features here). Scene context functioned as a prior over possible target locations, while target-relevant features acted as the likelihood function (as in Ehinger et al., 2009). Scene context was measured using GIST (Torralba et al., 2006) and target-relevant features were measured using Histograms of Oriented Gradients (Dalal & Triggs, 2005). We represented scene context with a mixture of Gaussians and target-relevant features with a multivariate Gaussian distribution. The model selected new fixations using either a maximum a posteriori (MAP) or entropy limit minimization (ELM) rule. To compare the fixations selected by the models we tested human observers on a pedestrian search task in natural images. Prior to the search task, a visibility map was measured using data from human observers in detection task. The visibility map was then used to degrade the target-relevant feature information in our model simulations, representing the effects of foveation. We found evidence that human observers do use scene context and target-relevant features as sources of information to guide their fixations in natural scenes. Additionally, fixations selected by human observers were more consistent with the ELM decision rule than the MAP decision rule. We close by noting the limitations of the models and discuss potential extensions. KW - Psychology KW - Eye tracking KW - Bayesian model LA - English ER -