Staff View
DNA Sequence Classification Using Compression-Based Induction

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
12 p.
Note (type = special display note)
Technical report lcsr-TR-240
Name (type = corporate); (authority = RutgersOrg-School)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (type = corporate); (authority = RutgersOrg-Department)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
DNA Sequence Classification Using Compression-Based Induction
Abstract (type = abstract)
Inductive learning methods, such as neural networks and decision trees, have become a popular approach to developing DNA sequence identification tools. Such methods attempt to form models of a collection of training data that can be used to predict future data accurately. The common approach to using such methods on DNA sequence identification problems forms models that depend on the absolute locations of nucleotides and assume independence of consecutive nucleotide locations. This paper describes a new class of learning methods, called compression-based induction (CBI), that is geared towards sequence learning problems such as those that arise when learning DNA sequences. The central idea is to use text compression techniques on DNA sequences as the means for generalizing from sample sequences. The resulting methods form models that are based on the more important relative locations of nucleotides and on the dependence of consecutive locations. They also provide a suitable framework into which biological domain knowledge can be injected into the learning process. We present initial explorations of a range of CBI methods that demonstrate the potential of our methods for DNA sequence identification tasks.
Name (type = personal)
NamePart (type = family)
Loewenstern
NamePart (type = given)
David
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Hirsh
NamePart (type = given)
Haym
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Yianilos
NamePart (type = given)
Peter
Affiliation
NEC Research Institute
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Noordewier
NamePart (type = given)
Michiel
Affiliation
NEC Research Institute
Role
RoleTerm (type = text); (authority = marcrt)
author
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = exact); (keyDate = yes)
1995-03
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3SJ1Q2Q
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.4
ApplicationName
GPL Ghostscript 9.07
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:37:13
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:37:13
Back to the top
Version 8.3.10
Rutgers University Libraries - Copyright ©2019