Staff View
Exploiting knowledge of uncertainty: induction of classifiers by the incremental combination of probabilistic evidence

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
1 online resource (196 pages) : illustrations
Note (type = special display note)
Technical report ML-TR-40
Name (type = corporate); (authority = RutgersOrg-School)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (type = corporate); (authority = RutgersOrg-Department)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
Exploiting knowledge of uncertainty: induction of classifiers by the incremental combination of probabilistic evidence
Abstract (type = abstract)
Addressing noise and uncertainty in training data is an important issue in inductive learning. Inductive learners are necessarily sensitive to noise and uncertainty in training data, since training data constitutes the primary basis for generalization. Some of today's more popular off-the-shelf learners ignore the presence of imperfect data or invoke statistically motivated post-processing to help compensate for its unwanted effects; none exploits specific knowledge of noise or uncertainty. Several research projects have taken a step in this direction by explicitly addressing noise in training data. Unfortunately, these works are limited because they depend on particular models of environmental noise, overly restrictive concept-description languages, and sometimes unrealistic sample complexity. This dissertation describes a knowledge-based approach that uses uncertain reasoning to help overcome these limitations.

In what follows, learning from imperfect data is formulated as the search for a hypothesis with maximum {em a posteriori} probability. Implementing the search as incremental probabilistic-evidence combination extends the range of useful uncertainty models to those described by discrete probability distributions. I built a novel conjunction learner and from it an iterative DNF learner. On standard datasets, where strong knowledge is unavailable, the DNF learner is competitive with conventional learners. In experiments using synthetic data, where strong knowledge is available, the knowledge-based learners are superior to their more familiar, conventional counterparts.

To demonstrate that problem-specific uncertainty models can be engineered and used effectively in practical problems, the evidence-combination approach was used to address a difficult open problem in molecular biology: learning to recognize promoter sequences in {em E.~coli}. Earlier efforts notwithstanding, the inherent uncertainty as to the location of the biologically active regions in the raw DNA data actually invalidates the direct application of many standard inductive learning methods. Here, knowledge from molecular biology was used instead to engineer models of three domain uncertainties and a mapping from raw sequence data to a plausible and focused evidence representation. The evidence-combination approach then yields classifiers that are accurate and credible, and the best yet developed for this important problem.
Name (type = personal)
NamePart (type = family)
Norton
NamePart (type = given)
Steven W.
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = exact); (keyDate = yes)
1995-05
Name (type = personal)
NamePart (type = family)
Hirsh
NamePart (type = given)
Haym
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = RULIB)
chair
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/t3-yhgg-jq36
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.4
ApplicationName
GPL Ghostscript 9.07
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:38:20
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:38:20
Back to the top
Version 8.3.13
Rutgers University Libraries - Copyright ©2020