Staff View
Data mining methodologies with uncertain data

Descriptive

TitleInfo
Title
Data mining methodologies with uncertain data
Name (type = personal)
NamePart (type = family)
Tavakkol
NamePart (type = given)
Behnam
NamePart (type = date)
1987-
DisplayForm
Behnam Tavakkol
Role
RoleTerm (authority = RULIB)
author
Name (type = personal)
NamePart (type = family)
Albin
NamePart (type = given)
Susan L.
DisplayForm
Susan L. Albin
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
chair
Name (type = personal)
NamePart (type = family)
Jeong
NamePart (type = given)
Myong-Kee
DisplayForm
Myong-Kee Jeong
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
co-chair
Name (type = personal)
NamePart (type = family)
Jafari
NamePart (type = given)
Mohsen
DisplayForm
Mohsen Jafari
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
Guo
NamePart (type = given)
Weihong
DisplayForm
Weihong Guo
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
internal member
Name (type = personal)
NamePart (type = family)
Cabrera
NamePart (type = given)
Javier
DisplayForm
Javier Cabrera
Affiliation
Advisory Committee
Role
RoleTerm (authority = RULIB)
outside member
Name (type = corporate)
NamePart
Rutgers University
Role
RoleTerm (authority = RULIB)
degree grantor
Name (type = corporate)
NamePart
School of Graduate Studies
Role
RoleTerm (authority = RULIB)
school
TypeOfResource
Text
Genre (authority = marcgt)
theses
OriginInfo
DateCreated (qualifier = exact)
2018
DateOther (qualifier = exact); (type = degree)
2018-05
CopyrightDate (encoding = w3cdtf); (qualifier = exact)
2018
Place
PlaceTerm (type = code)
xx
Language
LanguageTerm (authority = ISO639-2b); (type = code)
eng
Abstract (type = abstract)
Uncertain objects arise in many applications such as sensor networks, moving object databases and medical and biological databases where each feature is represented by multiple observations or a given or fitted probability density function (PDF). In this dissertation we present a methodology to classify uncertain objects based on a probabilistic distance measure between an uncertain object and a group of uncertain objects. We call this newly proposed measure object-to-group probabilistic distance measure, OGPDM, noting that dozens of probabilistic distance measures (PDM) for the distance between two pdfs exist in the literature. To assess the accuracy of the OGPDM, we compare it to some existing classifiers, i.e., K-Nearest Neighbor (KNN) classifier on object means (certain KNN) and uncertain naïve Bayesian classifier. In addition we compare OGPDM to an uncertain K-Nearest Neighbor (KNN) classifier, which we propose here, that uses existing PDMs to measure object-to-object distances and then classifies using KNN. We illustrate the advantages of the proposed OGPDM classifier with both simulated and real data. OGPDM captures the correlation among features within a class. Also, it takes into account the correlation among features within objects which is not taken into account in most of other uncertain data classification approaches. Because of existing levels of uncertainty for uncertain data objects, the scatter of this type of objects might be very different than the scatter of certain data objects. Measures of scatter for uncertain objects have not been defined before. Here in this dissertation, we define measures of scatter such as covariance matrix, within scatter matrix, and between scatter matrix, for uncertain data objects. Also, we extend the idea of Fisher Linear Discriminant Analysis (LDA) for uncertain objects. We also develop Kernel Fisher Discriminant for uncertain objects. The developed Uncertain Fisher LDA produces linear decision boundaries for separating classes of uncertain data objects while the developed Uncertain Kernel Fisher Discriminants produce nonlinear decision boundaries. The developed Uncertain Kernel Fisher Discriminants are for two cases: when the uncertain objects are given with PDF, and when the uncertain objects are given with multiple points. We show through examples that the obtained decision boundaries from our developed uncertain Fisher Discriminants seem very reasonable for separating classes of uncertain objects. Also, we compare the classification performance with many existing classifiers on simulated scenarios with uncertain objects modeled with skew-normal distribution and a real-world data set. To evaluate the quality of formed clusters and determine the correct number of clusters, clustering validity indices can be used. They can be applied on the results of clustering algorithms to validate the performance of those algorithms. In this dissertation, two clustering validity indices named uncertain Silhouette and Order Statistic, are developed for uncertain data. To the best of our knowledge, there is not any clustering validity index in the literature that is designed for uncertain objects and can be used for validating the performance of uncertain clustering algorithms. Our proposed validity indices use probabilistic distance measures to capture the distance between uncertain objects. They outperform existing validity indices for certain data in validating clusters of uncertain data objects and are robust to outliers. The Order Statistic index, in particular, a general form of uncertain Dunn validity index (also developed here), is well capable of handling instances where there is a single cluster that is relatively scattered (not compact) compared to other clusters, or there are two clusters that are close (not well-separated) compared to other clusters. The aforementioned instances can potentially result in the failure of existing clustering validity indices in detecting the correct number of clusters.
Subject (authority = RUETD)
Topic
Industrial and Systems Engineering
RelatedItem (type = host)
TitleInfo
Title
Rutgers University Electronic Theses and Dissertations
Identifier (type = RULIB)
ETD
Identifier
ETD_8778
PhysicalDescription
Form (authority = gmd)
electronic resource
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
1 online resource (xv, 101 p. : ill.)
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references
Subject (authority = ETD-LCSH)
Topic
Data mining
Note (type = statement of responsibility)
by Behnam Kavakkol
RelatedItem (type = host)
TitleInfo
Title
School of Graduate Studies Electronic Theses and Dissertations
Identifier (type = local)
rucore10001600001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3BZ69G4
Genre (authority = ExL-Esploro)
ETD doctoral
Back to the top

Rights

RightsDeclaration (ID = rulibRdec0006)
The author owns the copyright to this work.
RightsHolder (type = personal)
Name
FamilyName
Tavakkol
GivenName
Behnam
Role
Copyright Holder
RightsEvent
Type
Permission or license
DateTime (encoding = w3cdtf); (qualifier = exact); (point = start)
2018-04-08 14:50:58
AssociatedEntity
Name
Behnam Tavakkol
Role
Copyright holder
Affiliation
Rutgers University. School of Graduate Studies
AssociatedObject
Type
License
Name
Author Agreement License
Detail
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
ETD
OperatingSystem (VERSION = 5.1)
windows xp
CreatingApplication
Version
1.5
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2018-04-10T18:57:23
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2018-04-10T18:57:23
ApplicationName
Microsoft® Word 2013
Back to the top
Version 8.5.5
Rutgers University Libraries - Copyright ©2024