Staff View
Fast search methods for biological sequence databases

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
1 online resource (33 pages)
Note (type = special display note)
Technical report LCSR-TR-217
Name (type = corporate); (authority = RutgersOrg-School)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (type = corporate); (authority = RutgersOrg-Department)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
Fast search methods for biological sequence databases
Abstract (type = abstract)
Biology researchers have a pressing need for data management technologies which will make the storage and retrieval of DNA and protein sequence data accurate and efficient. The volume of data generated by DNA sequencing is already massive and will continue to grow rapidly. Even if the current sequence databases are adequate today, they most assuredly will become inadequate in the future when far more sequence data has been determined. The direction of future research in sequence databases needs to be in the organization of information. This is so that the volume of data needing to be searched does not grow linearly with the volume of sequence data being discovered.

We propose to develop an index structure and retrieval system called PROXIMAL for biological sequence databases which promises to be efficient and general. This organization of the databases will complement other current efforts at sequence comparison and analysis, by providing an infrastructure in which other methods can be used to efficiently locate desired sequences. Our method relies on the use of reference strings to partition the database of sequences. It is efficient since the use of multiple reference strings for any given distance measure greatly reduces the number of sequences that must be examined, allowing us to quickly locate sequences based on a precomputed metric. It is general since multiple distance measures can be used. These include at least differing gap and mismatch weights for the basic edit distance calculation, or entirely different models of mutation. The only requirement is that there is a metric structure - mainly, that the calculations satisfy the triangle inequality. This is a weak requirement that is satisfied by many interesting measures, including those currently in wide use for sequence comparison.
Name (type = personal)
NamePart (type = family)
Ganguly
NamePart (type = given)
Sumit
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Leichter
NamePart (type = given)
Jerry
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Noordewier
NamePart (type = given)
Michiel
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = exact); (keyDate = yes)
1993-10
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/t3-9vhv-cx15
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.4
ApplicationName
GPL Ghostscript 9.07
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:36:54
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:36:54
Back to the top
Version 8.3.13
Rutgers University Libraries - Copyright ©2020