LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Abstract (type = abstract)
Machine learning in modern systems security research is common. Researchers regularly use machine learning based models for tasks such as authentication and user identification. Often the practices followed for developing and evaluating a machine learning model that forms the decision logic of these systems are misleading. For example, the maximum accuracy (ACC) can be inflated when the data used to train a model is class skewed. Additionally, models built on data from small user groups may achieve high performance values but fail to generalize in a larger population. These inflated performance values will lead to unexpected system-level failures.
There are several metrics that are used to evaluate how well a system performs at the task of distinguishing users. Existing metrics are often inadequate because they fail to capture the range of possible contingencies that arise when the measurements that decisions are based on have inherent ambiguities. These ambiguities can result in mistaking one user for another. For authentication or user identification, the consequences for such mistakes is dictated by the target application. Mistakenly granting access to a bank account has significantly different consequences than loading the wrong set of user preferences. Many of the common metrics hide underlying problems within the machine learning models. Models that are not tested with an adequate number of users can fail in surprising ways.
In this PhD thesis, we explore the underlying reasons why the metrics are misleading, and models fail to generalize. We identify the flaws with the metrics and show how some metrics can degrade in performance when assumptions about the number of users are violated. We present surveys of proposals for new authentication or user identification systems from top-tier publication venues. We found that 94% (33/35) the authentication systems surveyed had reporting flaws and 77% of user identification systems used less than 20 participants to validate their system. Finally, we present solutions to these issues in the form of metrics that can be visually checked for flaws and testing methods that can be used to determine when assumptions about population size break down.
Subject (authority = local)
Topic
Machine learning
Subject (authority = RUETD)
Topic
Electrical and Computer Engineering
RelatedItem (type = host)
TitleInfo
Title
Rutgers University Electronic Theses and Dissertations
Identifier (type = RULIB)
ETD
Identifier
ETD_11186
PhysicalDescription
Form (authority = gmd)
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
1 online resource (ix, 99 pages) : illustrations
Note (type = degree)
M.S.
Note (type = bibliography)
Includes bibliographical references
RelatedItem (type = host)
TitleInfo
Title
School of Graduate Studies Electronic Theses and Dissertations
Identifier (type = local)
rucore10001600001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.