LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Abstract (type = abstract)
The dissertation examines three distinct methodologies for analyzing data yielded from preclinical experiments:
1) Big data has created new challenges for data analysis, particularly in the realm of creating meaningful groups or clusters of data or classification. Clustering techniques, such as K-means or hierarchical clustering, based on pairwise distances of N objects, are popular methods for performing exploratory analysis on large datasets such as these. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order N^2. A work-around is to take a random sample of the large dataset and perform the clustering technique with the reduced dataset; however, this is not a foolproof solution since the structure of the dataset, particularly at the edges of the dataset, is not guaranteed to be maintained. In this chapter we will propose a new solution through the concept of “data nuggets”. These data nuggets reduce a larger dataset into a small collection of nuggets of data, each containing a center, weight, and a scale parameter. Once the data is re-expressed as data nuggets, we may apply algorithms that compute standard statistical methods, such as principal components analysis (PCA), clustering, classification, etc. We apply the methodology of data nuggets to the analysis of a dataset from flow cytometry in Biopharmaceutical research. This was conducted by performing weighted PCA and weighted K-means clustering on a dataset containing millions of observations (B-cells), and the objective was to find clusters that characterize cells according to which proteins are active on their surfaces. An R package was also developed to conduct these methods.
2) There are many cases in preclinical drug discovery when experiments are repeated but not precisely replicated regarding treatment arms. Further, full datasets are not always immediately accessible, leaving analysts to rely on summary measures such as sample mean and standard error. If one is only interested in comparing two treatment arms at a time, meta-analysis is a useful tool; however, when one applies this method they are limited to only comparing two of the potentially numerous treatment arms at a time. Further, information from experiments lacking these two treatment arms is not used. Mixed treatment comparisons meta-analysis, also known as network meta-analysis, can be used instead to compare all available treatment arms at once. This chapter will explain, explore, and compare two frequentist methods that exist to apply network meta-analysis. We focus on sets of experiments with designs typically found in preclinical experiments. We also use simulations to compare network meta-analysis results to those given by mixed-effect linear models for these types of experiments. An R package was also developed to perform both methods of network meta-analysis.
3) Power calculations for hypothesis tests play a critical role in conducting both clinical and non-clinical trials. Many programs exist to calculate the power for popular hypothesis tests, such as Student's t-test for hypothesis tests analyzing continuous data or the log-rank test for hypothesis tests analyzing survival data. Calculating the power for hypothesis tests analyzing ordinal categorical data can be much more complicated. For data such as this, observations are given in the form of scores on a scale with a small range, typically between three and five points. The data is assumed to be distributed according to a multinomial distribution which can depend on many parameters. We propose a simple yet effective method for defining alternative multinomial distributions and performing power calculations by creating and shifting quantiles of the standard normal distribution. We offer simulation results and apply the method to a dataset. An R package was also developed to use this method.
Subject (authority = RUETD)
Topic
Statistics and Biostatistics
Subject (authority = LCSH)
Topic
Experiments -- Data processing
RelatedItem (type = host)
TitleInfo
Title
Rutgers University Electronic Theses and Dissertations
Identifier (type = RULIB)
ETD
Identifier
ETD_9643
PhysicalDescription
Form (authority = gmd)
InternetMediaType
application/pdf
InternetMediaType
text/xml
Extent
1 online resource (xi, 139 pages) : illustrations
Note (type = degree)
Ph.D.
Note (type = bibliography)
Includes bibliographical references
RelatedItem (type = host)
TitleInfo
Title
School of Graduate Studies Electronic Theses and Dissertations
Identifier (type = local)
rucore10001600001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
I hereby grant to the Rutgers University Libraries and to my school the non-exclusive right to archive, reproduce and distribute my thesis or dissertation, in whole or in part, and/or my abstract, in whole or in part, in and from an electronic format, subject to the release date subsequently stipulated in this submittal form and approved by my school. I represent and stipulate that the thesis or dissertation and its abstract are my original work, that they do not infringe or violate any rights of others, and that I make these grants as the sole owner of the rights to my thesis or dissertation and its abstract. I represent that I have obtained written permissions, when necessary, from the owner(s) of each third party copyrighted matter to be included in my thesis or dissertation and will supply copies of such upon request by my school. I acknowledge that RU ETD and my school will not distribute my thesis or dissertation or its abstract if, in their reasonable judgment, they believe all such rights have not been secured. I acknowledge that I retain ownership rights to the copyright of my work. I also retain the right to use all or part of this thesis or dissertation in future works, such as articles or books.