Adaptive multimodal integration of speech and gaze

Mantravadi, Chandra Sekhar

doi:doi:10.7282/T3QC03PM

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Adaptive multimodal integration of speech and gaze

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(2.25 MB)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Mantravadi, Chandra Sekhar. Adaptive multimodal integration of speech and gaze. Retrieved from https://doi.org/doi:10.7282/T3QC03PM

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleAdaptive multimodal integration of speech and gaze

NameMantravadi, Chandra Sekhar (author); Wilder, Joseph (chair); Tremaine, Marilyn (co-chair); Mammone, Richard (internal member); Rabiner, Lawrence (internal member); Marsic, Ivan (internal member); Gwizdka, Jacek (outside member); Rutgers University; Graduate School - New Brunswick

Date Created

Other Date2009-10 (degree)

SubjectElectrical and Computer Engineering, Human-computer interaction, User interfaces (Computer systems)

Extentxiii, 167 p. : ill.

DescriptionSpeech has been used as the foundation for many human/machine interactive systems to convey the user’s intent to the system. However, other input mechanisms, commonly called modalities, such as gaze, touch, and hand gestures, have been explored as a means of providing a more robust interaction in environments where speech alone is not adequate. By combining the inputs from multiple, complementary modalities, none of which is perfectly reliable, a better understanding of the user’s true intent can be imparted to the system. In this dissertation, the effectiveness of using gaze (where someone is looking) to aid speech in providing the user’s intent to the machine is explored. To create a speech and gaze integration model, two human factors experiments were conducted to collect data for building this model. The first experiment had the user read a single word displayed on a screen, and the second experiment required the user to read a designated word from a menu of words. Speech onset time and the user’s gaze patterns data were captured
and analyzed to understand the timing relations between the two modalities. A set of gaze/speech
features were extracted from the data and used to predict the location of the word that the user read. The best features and the best model for predicting the location of the target word were found through an iterative trial and error process. A linear model was able to predict the gaze location of the target as well as any of the non-linear models considered. The linear system
representation was then used to create an adaptive model using the Row Action Projection (RAP) technique. The RAP adaptation model was found to predict the user’s intent with higher probability for the majority subjects than the non-adaptive approaches. The RAP model adapted to the speech/gaze patterns of each individual user as well as the variation in a single user’s interaction behavior over time. It was also found that the feature set used for successfully identifying the target in Experiment 1, a simple isolated word task, was different than that used in Experiment 2, a more complex menu selection task, suggesting that task complexity was an important consideration in the design of a speech/gaze interface. In summary, this dissertation has shown that an adaptive gaze and speech integration model is better than speech or gaze
performance alone.

NotePh.D.

NoteIncludes bibliographical references (p. 155-166)

Noteby Chandra Sekhar Mantravadi

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/T3QC03PM

Languageeng

CollectionGraduate School - New Brunswick Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide