Wu, Tong. Novel methods for predicting timing and attention in human-agent interaction through application-driven scenarios. Retrieved from https://doi.org/doi:10.7282/t3-236f-pd67
DescriptionThis thesis explores the critical aspects of timing and attention in developing advanced interactive systems. It investigates the capabilities of multimodal sensing systems and speech agents in fostering more proactive services, where the AI agent initiates natural interactions with the user. These studies acknowledge that current interactive agents, such as Siri, Alexa, and Google Assistant, are predominantly reactive, necessitating user initiation via a wake word. However, it identifies the potential for proactive interaction, contingent on the speech generation system's ability to accurately discern suitable moments to engage users. This presents a complex challenge due to the dynamic nature of human attention and the complexities of various environments. The thesis introduces innovative methods capable of predicting well-timed interactions initiated by agents, focusing on harnessing and enhancing human attention during these interactions. The first part of the thesis centers on predicting interaction timing in two prevalent scenarios: driving and dining, employing multimodal methods. In the context of human-vehicle interaction, the study recognizes the potential of proactive agents. It posits that well-designed proactive speech interfaces can be safer for drivers than visual-manual interfaces featuring buttons, knobs, and touchscreens. A multi-sensor fusion model is developed to predict optimal moments to engage drivers in speech interaction. The thesis also considers environments involving multiple individuals. It identifies social dining as an ideal setting for developing state-of-the-art methods for modeling human behaviors. An innovative interleaved model is constructed to predict optimal moments for an assisted-feeding robot to feed the user in social dining settings. The latter part of the thesis emphasizes the significance of human gaze and situational awareness as key indicators of attention levels. These factors greatly influence the prediction of opportune interaction timings in both driving and social dining scenarios. To effectively measure and enhance human attention levels, the thesis proposes an augmented reality (AR) interface. This interface guides the user's gaze toward relevant points of interest, enhancing their situational awareness and improving the timing and quality of interactions. It also opens up for the opportunity to close the loop by incorporating reinforcement learning. The thesis concludes by highlighting the innovative methods it presents as a novel approach to enhancing human attention levels in human-agent interactions. These methods pave the way for more efficient and satisfying user experiences. By focusing on the timing of interactions and the attention level of users, the thesis underscores the potential for significant advancements in interactive systems.