Description
TitleEvaluating neural networks for multi-layer network feature classification
Date Created2023
Other Date2023-05 (degree)
Extent133 pages : illustrations
DescriptionTechnology advancements over the past decades have led to massive growth in the number and kinds of networked devices. As a result of these improvements, network traffic has grown by orders of magnitude. cite{SOM_Ting} This increase motivates traffic classification for various purposes, including network management, quality of service, and threat identification. cite{SOM_Ting, AE_Song, net_traffic_rezaei} Our research investigates a suite of machine learning algorithms for classification analysis across several types of network layers, from the physical layer modulation to layer 4 web server traffic. This work uses neural network (NN) classifiers to explore modulation and website recognition through the feature spaces of raw IQ samples and packet length and inter-arrival timing.
We compare five different types of supervised learning NNs, five unsupervised learning (USL) techniques, and a matched filter classifier (MFC). Our supervised learning networks include a convolutional neural network (CNN), a fully connected network (FCN), a long short-term memory network (LSTM), an autoencoder (AE), and our novel binary (BIN) neural network. The unsupervised classification methods entail a density-based spatial clustering of applications (DBSCAN) algorithm, a self-organizing map (SOM), a K-MEANS cluster algorithm, and autoencoders with DBSCAN and SOM clustering.
We examine the classification accuracy, complexity, training time, and inference latency of these approaches to provide a comprehensive comparison of their trade-offs between accuracy and resource usage. We found that although the CNN was the most accurate for all network layers and types of traffic, it had high resource costs. The CNN is robust across a variety of traffic types, while the performance of other networks deteriorates much more rapidly under adverse conditions, such as noise or additional added classes. Moreover, we discovered that generation techniques of raw IQ samples impact the classification accuracy of the networks. This implies that careful attention should be used when synthetically generating modulation signals for simulation or classification purposes. For the modulation data, most supervised networks were able to classify with over 95\% accuracy given the correct test conditions. We discovered that the FCN has slightly less accuracy, but much lower resource utilization than the CNN for various forms of modulation data.
We applied our classification methodology to website recognition using a small, novel feature space of only packet length and inter-arrival time. We demonstrate that the networks can discriminate between different websites using this minimal feature space. That is, websites generate a unique pattern using only these two features. Some networks were able to distinguish between 10 websites with over 80\% accuracy. Our results imply that existing forms of traffic obfuscation such as encryption, virtual private networks, and onion routing, are not sufficient. New forms of maintaining privacy will require randomizing packet sizes and inter-packet timing distributions.
NotePh.D.
NoteIncludes bibliographical references
Genretheses
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.