Staff View
Filtering techniques for data streams

Descriptive

TitleInfo (displayLabel = Citation Title); (type = uniform)
Title
Filtering techniques for data streams
Name (ID = NAME001); (type = personal)
NamePart (type = family)
Rozenbaum
NamePart (type = given)
Irina
DisplayForm
Irina Rozenbaum
Role
RoleTerm (authority = RULIB)
author
Name (ID = NAME002); (type = personal)
NamePart (type = family)
Muthukrishnan
NamePart (type = given)
Shanmugavelayutham
Affiliation
Advisory Committee
DisplayForm
Shanmugavelayutham Muthukrishnan
Role
RoleTerm (authority = RULIB)
chair
Name (ID = NAME003); (type = personal)
NamePart (type = family)
Marian
NamePart (type = given)
Amelie
Affiliation
Advisory Committee
DisplayForm
Amelie Marian
Role
RoleTerm (authority = RULIB)
internal member
Name (ID = NAME004); (type = personal)
NamePart (type = family)
Martin
NamePart (type = given)
Richard
Affiliation
Advisory Committee
DisplayForm
Richard Martin
Role
RoleTerm (authority = RULIB)
internal member
Name (ID = NAME005); (type = personal)
NamePart (type = family)
Srivastava
NamePart (type = given)
Divesh
Affiliation
Advisory Committee
DisplayForm
Divesh Srivastava
Role
RoleTerm (authority = RULIB)
outside member
Name (ID = NAME006); (type = corporate)
NamePart
Rutgers University
Role
RoleTerm (authority = RULIB)
degree grantor
Name (ID = NAME007); (type = corporate)
NamePart
Graduate School - New Brunswick
Role
RoleTerm (authority = RULIB)
school
TypeOfResource
Text
Genre (authority = marcgt)
theses
OriginInfo
DateCreated (qualifier = exact)
2007
DateOther (qualifier = exact); (type = degree)
2007
Language
LanguageTerm
English
PhysicalDescription
Form (authority = marcform)
electronic
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
x, 157 pages
Abstract
With the growth in popularity and complexity of streaming applications, there is a rising need for sophisticated analyses of massive high speed data generated by such applications. Such analyses often need to be performed in near real-time, using limited system resources. Under such conditions, it is very important to find an appropriate balance between the efficiency of processing and the accuracy of the produced results. A common technique is to filter the stream with suitable conditions so that the resulting data size is manageable, and the analyses are still accurate.
The work presented by this thesis focuses on a number of complex filtering techniques that are of interest in data steam processing in general and in network traffic monitoring in particular. These techniques allow the analyst to define a filtering condition that is more appropriate for the particular query at hand than the simpler random uniform sampling.
First, we propose a single operator which captures a common thread of evaluation of sampling queries and can be specialized to implement a wide variety of quite sophisticated stream sampling algorithms within an operational data stream management system and scale in performance to line speeds. Additionally, we propose a solution for flow sampling mechanism, which integrates the logic of flow aggregation as well as flow sampling into one procedure that works directly on IP traffic.
Next, we introduce the notion of the inverse distribution for massive data streams, and present algorithms that draw a uniform sample from the inverse distribution in the presence of inserts and deletes to the stream; such a sample can be used for a variety of summarization and filtering/mining tasks.
Another contribution of this thesis is the development of a filter join operator, which makes it feasible to evaluate a common type of join query that searches for records matching dynamic criteria on high speed data streams, in an efficient, stable and accurate manner. We also present analyses of query transformations which expose the filter join operator in conventional query join.
Finally, we study the problem of matching regular expression that can span multiple data records in a data stream in the presence of stream quality problems, such as duplicates and out-of-order records; we present a number of algorithms that can match regular expressions over multiple data stream records without stream reassembly, by maintaining partial state of the data in the stream.
The ideas presented in this thesis are motivated by actual practical problems that arise in data stream processing, and are further validated by the presented experimental studies.
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references (p. 145-155).
Subject (ID = SUBJ1); (authority = RUETD)
Topic
Computer Science
Subject (ID = SUBJ2); (authority = ETD-LCSH)
Topic
Streaming technology (Telecommunications)
Subject (ID = SUBJ3); (authority = ETD-LCSH)
Topic
Data transmission systems
Subject (ID = SUBJ4); (authority = ETD-LCSH)
Topic
Database management
RelatedItem (type = host)
TitleInfo
Title
Graduate School - New Brunswick Electronic Theses and Dissertations
Identifier (type = local)
rucore19991600001
Identifier (type = hdl)
http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.16764
Identifier
ETD_351
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3GX4BZZ
Genre (authority = ExL-Esploro)
ETD doctoral
Back to the top

Rights

RightsDeclaration (AUTHORITY = GS); (ID = rulibRdec0006)
The author owns the copyright to this work.
Copyright
Status
Copyright protected
Availability
Status
Open
AssociatedEntity (AUTHORITY = rulib); (ID = 1)
Name
Irina Rozenbaum
Role
Copyright holder
Affiliation
Rutgers University. Graduate School - New Brunswick
RightsEvent (AUTHORITY = rulib); (ID = 1)
Type
Permission or license
Detail
Non-exclusive ETD license
AssociatedObject (AUTHORITY = rulib); (ID = 1)
Type
License
Name
Author Agreement License
Detail
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.
Back to the top

Technical

Format (TYPE = mime); (VERSION = )
application/x-tar
FileSize (UNIT = bytes)
689664
Checksum (METHOD = SHA1)
0ac400d2ddd64ee4b963c299b966ae522e31be20
ContentModel
ETD
CompressionScheme
other
OperatingSystem (VERSION = 5.1)
windows xp
Format (TYPE = mime); (VERSION = NULL)
application/x-tar
Back to the top
Version 8.5.5
Rutgers University Libraries - Copyright ©2024