Leveraging user access patterns and cyberinfrastructure to address facility data discovery and access challenges
Citation & Export
Hide
Simple citation
Qin, Yubo.
Leveraging user access patterns and cyberinfrastructure to address facility data discovery and access challenges. Retrieved from
https://doi.org/doi:10.7282/t3-yf00-xx47
Export
Description
TitleLeveraging user access patterns and cyberinfrastructure to address facility data discovery and access challenges
Date Created2021
Other Date2021-05 (degree)
Extent1 online resource (xiv, 99 pages)
DescriptionWith the growing number and increasing availability of shared-use facilities, instruments, and observatories, facility data is becoming an essential part of application workflows and contributor to scientific discoveries in a range of disciplines. However, the corresponding growth in the number of users accessing these facilities coupled with the expansion in the scale and variety of the data is making it challenging for these facilities to ensure their data can be discovered, accessed, integrated, and analyzed in a timely manner.
This dissertation proposes to address these challenges by modeling user data query patterns and exploring opportunities to leverage novel cyberinfrastructure (CI) capabilities. Specifically, in this research, we analyze data access traces for two large-scale observatories, Ocean Observatories Initiative (OOI) and Geodetic Facility for the Advancement of Geoscience (GAGE), to identify their typical user access patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. We then explore how in-network computing and storage capabilities, such as those provided by the Virtual Data Collaboratory (VDC) cyberinfrastructure platform, can be used along with this knowledge of access patterns, to address data discovery and access challenges.
First, we leverage concepts underlying recommender systems, which are extremely effective in e-commerce, to address these data discovery challenges. We analyze data from facilities and identify and model user-query patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. Then, we use this analysis to generate a knowledge graph and develop the collaborative knowledge-aware graph attention network (CKAT) recommendation model, which leverages graph neural networks (GNNs) to explicitly encode the collaborative signals through propagation and combine them with knowledge associations. Moreover, we integrate a knowledge-aware neural attention mechanism to enable the CKAT to pay more attention to key information while reducing irrelevant noise, thereby increasing the accuracy of the recommendations. We apply the proposed model on OOI and GAEG facility datasets and empirically demonstrate that the CKAT can effectively facilitate data discovery, significantly outperforming several compelling state-of-the-art baseline models. To the best of our knowledge, we believe that this is the first study that models large-scale data facilities’ user-query patterns and leverages knowledge graph techniques to support the discovery of facilities’ data.
Second, we design a push-based data delivery framework to address the data access challenge. This framework leverages the CI's emerging in-network capabilities to construct a distributed cache network and exploit the knowledge of user access patterns to develop a hybrid pre-fetching mechanism and data management strategies. It can accelerate the user data access performance by reducing their data retrieval pending time. This is because, from the end-user perspective, data access is typically local as data are often pre-fetched and cached. We evaluate the framework performance on a simulation of the VDC platform and compare its data management strategies with the state-of-the-art. The results demonstrate that the ability of the framework to significantly improve data delivery performance and reduce network traffic at the observatories' facilities.
NotePh.D.
NoteIncludes bibliographical references
Genretheses, ETD doctoral
LanguageEnglish
CollectionSchool of Graduate Studies Electronic Theses and Dissertations
Organization NameRutgers, The State University of New Jersey
RightsThe author owns the copyright to this work.