With the growth in popularity and complexity of streaming applications, there is a rising need for sophisticated analyses of massive high speed data generated by such applications. Such analyses often need to be performed in near real-time, using limited system resources. Under such conditions, it is very important to find an appropriate balance between the efficiency of processing and the accuracy of the produced results. A common technique is to filter the stream with suitable conditions so that the resulting data size is manageable, and the analyses are still accurate.
The work presented by this thesis focuses on a number of complex filtering techniques that are of interest in data steam processing in general and in network traffic monitoring in particular. These techniques allow the analyst to define a filtering condition that is more appropriate for the particular query at hand than the simpler random uniform sampling.
First, we propose a single operator which captures a common thread of evaluation of sampling queries and can be specialized to implement a wide variety of quite sophisticated stream sampling algorithms within an operational data stream management system and scale in performance to line speeds. Additionally, we propose a solution for flow sampling mechanism, which integrates the logic of flow aggregation as well as flow sampling into one procedure that works directly on IP traffic.
Next, we introduce the notion of the inverse distribution for massive data streams, and present algorithms that draw a uniform sample from the inverse distribution in the presence of inserts and deletes to the stream; such a sample can be used for a variety of summarization and filtering/mining tasks.
Another contribution of this thesis is the development of a filter join operator, which makes it feasible to evaluate a common type of join query that searches for records matching dynamic criteria on high speed data streams, in an efficient, stable and accurate manner. We also present analyses of query transformations which expose the filter join operator in conventional query join.
Finally, we study the problem of matching regular expression that can span multiple data records in a data stream in the presence of stream quality problems, such as duplicates and out-of-order records; we present a number of algorithms that can match regular expressions over multiple data stream records without stream reassembly, by maintaining partial state of the data in the stream.
The ideas presented in this thesis are motivated by actual practical problems that arise in data stream processing, and are further validated by the presented experimental studies.
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references (p. 145-155).
Subject (ID = SUBJ1); (authority = RUETD)
Topic
Computer Science
Subject (ID = SUBJ2); (authority = ETD-LCSH)
Topic
Streaming technology (Telecommunications)
Subject (ID = SUBJ3); (authority = ETD-LCSH)
Topic
Data transmission systems
Subject (ID = SUBJ4); (authority = ETD-LCSH)
Topic
Database management
RelatedItem (type = host)
TitleInfo
Title
Graduate School - New Brunswick Electronic Theses and Dissertations
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3GX4BZZ
Genre (authority = ExL-Esploro)
ETD doctoral
Back to the top
Rights
RightsDeclaration (AUTHORITY = GS); (ID = rulibRdec0006)
The author owns the copyright to this work.
Copyright
Status
Copyright protected
Availability
Status
Open
AssociatedEntity (AUTHORITY = rulib); (ID = 1)
Name
Irina Rozenbaum
Role
Copyright holder
Affiliation
Rutgers University. Graduate School - New Brunswick
RightsEvent (AUTHORITY = rulib); (ID = 1)
Type
Permission or license
Detail
Non-exclusive ETD license
AssociatedObject (AUTHORITY = rulib); (ID = 1)
Type
License
Name
Author Agreement License
Detail
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.