Novel clustering and classification algorithms for big data

Ghosh, Debopriya

doi:doi:10.7282/t3-74gr-vt97

RUcore: Rutgers University Community Repository

Search
- All
- Text
- Images
- Audio
- Video
Advanced Search | Help

Search all content in all RUcore collections.
Services
Collections

Help Contact Us My Account

Home

Resource

Novel clustering and classification algorithms for big data

PDF

PDF format is widely accepted and good for printing.

Plug-in required

PDF-1(772.40 kb)

Citation & Export

View Usage Statistics

Staff View

Citation & Export
Hide

Simple citation

Ghosh, Debopriya. Novel clustering and classification algorithms for big data. Retrieved from https://doi.org/doi:10.7282/t3-74gr-vt97

Export

Click here for information about Citation Management Tools at Rutgers.

Statistics
Hide

Description

TitleNovel clustering and classification algorithms for big data

NameGhosh, Debopriya (author); Katehakis, Michael (chair); Cabrera, Javier (co-chair); BEN-ISRAEL, ADI (internal member); Metaxas, Dmitri (internal member); Papadimitriou, Spyridon (internal member); Williams, Jerome (internal member); Lubomirski, Mariusz (outside member); Rutgers University; Graduate School - Newark

Date Created2020

Other Date2020-05 (degree)

SubjectManagement

Extent1 online resource (xi, 94 pages)

DescriptionThis dissertation studies two important problems that arise in the analysis of Big Data: high dimensionality and massive size of pertinent samples. In Chapter 1, we developed three novel algorithms for clustering and classification of Big Data. First, a novel two-way clustering approach that combines model-based and weighted K-means clustering methods. The two-way approach results in smaller subgroups of binary features that are of size p or less, so that the possible number of patterns is small enough to be efficiently handled by traditional clustering algorithms. This approach can also handle weighted reduced data to scale on massive sample sizes. In Chapter 2, we derived a weighted probabilistic distance-based clustering technique adjusted for cluster sizes. Chapter 3 introduces an ensemble method called Enriched Random Forest for high dimensional data (where n << p, n is the number of observations and p are the features). This algorithm can address situations where the dimension is very high but only a very small fraction of these features is truly informative.

NotePh.D.

NoteIncludes bibliographical references

Genretheses, ETD doctoral

Persistent URLhttps://doi.org/doi:10.7282/t3-74gr-vt97

LanguageEnglish

CollectionGraduate School - Newark Electronic Theses and Dissertations

Organization NameRutgers, The State University of New Jersey

RightsThe author owns the copyright to this work.

Version 8.5.5

Citation & ExportHide

Simple citation

Export

StatisticsHide

Description

Citation & Export
Hide

Statistics
Hide