Staff View
Process progress estimation and activity recognition

Descriptive

TitleInfo
Title
Process progress estimation and activity recognition
Name (type = personal)
NamePart (type = family)
Li
NamePart (type = given)
Xinyu
NamePart (type = date)
1990-
DisplayForm
Xinyu Li
Role
RoleTerm (authority = RULIB)
author
Name (type = personal)
NamePart (type = family)
Marsic
NamePart (type = given)
Ivan
DisplayForm
Ivan Marsic
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
chair
Name (type = corporate)
NamePart
Rutgers University
Role
RoleTerm (authority = RULIB)
degree grantor
Name (type = corporate)
NamePart
School of Graduate Studies
Role
RoleTerm (authority = RULIB)
school
TypeOfResource
Text
Genre (authority = marcgt)
theses
OriginInfo
DateCreated (qualifier = exact)
2018
DateOther (qualifier = exact); (type = degree)
2018-05
CopyrightDate (encoding = w3cdtf); (qualifier = exact)
2018
Place
PlaceTerm (type = code)
xx
Language
LanguageTerm (authority = ISO639-2b); (type = code)
eng
Abstract (type = abstract)
Activity recognition is fundamentally necessary in many real-world applications, making it a valuable research topic. For example, activity tracking and decision support is crucial in medical settings, and activity recognition and prediction are critical in smart home applications. In this paper, we focus on activity recognition strategies and their applications to real-world problems. Depending on the application scenario, activities can be hierarchically categorized into high-level and low-level activities. The high-level activities may contain one or more low-level activities. For example, if cooking is a high-level activity, it may contain several low-level activities such as preparing, chopping, stiring, etc… Although studied for decades, there are several challenges remaining for high-level activity recognition, also known as process phase detection. A high-level activity usually has a long duration and consists of several low-level activities. Treating high-level activity recognition as a per-time-instance classification problem overlooks the associations between activities over time. We thus proposed considering high-level activity recognition as a regression problem. Based on this assumption, we implemented a deep learning framework that extracts features from input data and designed a rectified tanh activation function to generate a continuous regression curve between 0 and 1. We used the regression result to represent the overall completeness of the event process. Because the same event often follows similar high-level activity processes, we then used a Gaussian mixture model (GMM) to take the estimated overall completeness to supplement high-level activity recognition. Since the Gaussian mixture model requires that there be no duplication of high-level activities in an event (single activity has to follow Gaussian distribution), it might not fully represent real-world scenarios. To combat this limitation, we further proposed the use of LSTM layers to replace the GMM for high-level activity prediction. We applied our system to four real-world sports and medical datasets, achieving state-of-the-art performance. The system is now working in a trauma rooms at the Children’s National Medical Center, estimating the overall completeness of each trauma resuscitation, the high-level activity of each trauma resuscitation, and remaining time for a trauma resuscitation to complete in real-time. Compared to high-level activities, the low-level activities are more challenging to recognize. This is because low-level activity recognition often requires detailed, noise-free sensor data, which is often difficult to obtain in real-world scenarios. Many manually crafted features were proposed to combat the data noise, but these features were often not generalizable and feature selection was often arbitrary. We are the first to propose deep learning with passive RFID data for activity recognition. The automatic feature extraction does not require manual input, making our system transferable and generalizable. We further proposed the RSS-map representation of RFID data, which works well with ConvNet structures by including both spatial and temporal associations. Because of the limitations of passive RFIDs, we extended our system from using a single sensor to working with a sensor network. We studied activity recognition with multiple sensory types, including RGB-D cameras, a microphone array, and the passive RFID sensor. We were able to follow previously successful activity recognition research focusing on each different sensor type. To build a system that makes final decisions based on features extracted from all sensors, we developed a modified slow fusion strategy, instead of traditional voting. We built a deep multimodal neural network that has multiple feature extraction sub-networks for different input modalities, that feed into a single activity prediction network. The multimodal structure is able to increase overall activity recognition accuracy, but one key problem remains: the extracted features from different sensors contain both useful and misleading information. The system simply takes all the extracted features for activity recognition, because it does not know which features to rely on for activity recognition. Addressing this issue, we proposed a network that automatically generates “masks” that highlight the important features for video-based activity recognition. Unlike many “attention” based deep learning frameworks, we used a conditional generative adversarial network for mask generating. This is because the conditional GAN gives us additional control of the generated masks, whereas we have no control of the generated attention map with regular attention networks. Our experimental results demonstrate that given manually generated activity performer masks as ground truth, the cGAN is able to generate masks that only highlight the activity performer. The activity recognition network with our proposed mask generator achieved performance comparable with other online systems on the published dataset. Though proven applicable, training the cGAN requires a large number of manually generated masks as ground truth, which is not often available in real-world applications. Building on the idea of a cGAN mask generator, we proposed a multimodal deep learning framework with attention that works with multi-sensory input. We proposed the feature attention and modality attention for feature extraction and fusion. The network can be fine-tuned by our asynchronous fine-tuning strategy using deep Q learning. Our experimental results demonstrate that our attention network with deep reinforcement learning based fine-tuning outperforms previous research. The proposed fine-tuning also prevents over-fitting when training a deep network on a small datasets. Finally, we propose and introduce our ongoing work on concurrent activity recognition and our future work. Concurrent activity performance is common in the real-world: a person can drink while watching TV; a medical team can perform multiple tasks simultaneously through different medical personnel. However, recognizing concurrent activities remains an open research topic because it is neither a simple multi-class nor a binary classification problem. We proposed a shared feature extractor to extract features from different input modalities. We then treated the concurrent activity recognition as a coding problem, and trained a deep auto-encoder to generate binary code denoting each activities’ relevant and irrelevant features for activity recognition. However, this network was hard to train and converge because the shared features contains both . The recognition network easily over-fit to the unrelated features as opposed to the activity itself. Because the ground truth labels only provide whether the recognized activity is correct or incorrect, it disregards the associations between recognition results and the feature space. Addressing such an issue, we further proposed to modify the reinforcement learning based plugin that has been successfully used in our attention tuning to provide additional information for concurrent activity recognition. We asked human to provide feedback on whether the system made the decision based on the correct and associated features first, and then only partially tuned the network weights based on human feedback.
Subject (authority = RUETD)
Topic
Electrical and Computer Engineering
RelatedItem (type = host)
TitleInfo
Title
Rutgers University Electronic Theses and Dissertations
Identifier (type = RULIB)
ETD
Identifier
ETD_8692
PhysicalDescription
Form (authority = gmd)
electronic resource
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
1 online resource (xiv, 107 p. : ill.)
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references
Note (type = statement of responsibility)
by Xinyu Li
RelatedItem (type = host)
TitleInfo
Title
School of Graduate Studies Electronic Theses and Dissertations
Identifier (type = local)
rucore10001600001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3CF9TJH
Genre (authority = ExL-Esploro)
ETD doctoral
Back to the top

Rights

RightsDeclaration (ID = rulibRdec0006)
The author owns the copyright to this work.
RightsHolder (type = personal)
Name
FamilyName
Li
GivenName
Xinyu
Role
Copyright Holder
RightsEvent
Type
Permission or license
DateTime (encoding = w3cdtf); (qualifier = exact); (point = start)
2018-02-27 16:47:11
AssociatedEntity
Name
Xinyu Li
Role
Copyright holder
Affiliation
Rutgers University. School of Graduate Studies
AssociatedObject
Type
License
Name
Author Agreement License
Detail
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
ETD
OperatingSystem (VERSION = 5.1)
windows xp
CreatingApplication
Version
1.5
ApplicationName
MiKTeX pdfTeX-1.40.18
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2018-03-03T01:23:49
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2018-03-03T01:23:49
Back to the top
Version 8.5.5
Rutgers University Libraries - Copyright ©2024