TitleMetadata and Modeling Outputs for study "The Quiet Transformations of Literary Studies". Data
Research genreQuantitative research; Experimental data; Repurposed data; Research data
Type of itemDataset
Creator(s)Goldstone, Andrew; Underwood, Ted
Date(s) of creation2014
Abstract or summaryNew analytical approaches, like topic modeling, can illuminate subtle transformations, revealing concepts, frequently taken for granted, to be more variable than scholars have assumed. In this study, the corpus that was modeled included 21,367 JSTOR articles and 13,221 distinct author names resulting in the 150-topic model.
The four files supporting this study and available here are: 1) vocab.txt: UTF-8 text, one word per line, giving all 98835 word types included in the model. The list of stop words excluded from this vocabulary is given at https://www.ideals.illinois.edu/handle/2142/45709, 2) id_map.txt: UTF-8 text, one string per line, giving JSTOR ID strings of all 23167 documents included in the model, in the order indexed by the sampling state file, 3) mallet_state.gz (370MB): gzip'd UTF-8 text representing the final sampling state output by MALLET. Each token of the input documents is represented by a single line, with six white-space delimited fields: document index, document label (unused), token index, word type index, word type as a string, topic index. The word type index is zero-based and corresponds to the order in vocab.txt. The document index is zero-based and corresponds to the order in id_map.txt, and 4) metadata.tar.gz (3.9MB): gzip'd tar archive of 8 CSV files containing metadata for the documents modeled. Metadata for documents in the model can be located by matching the "id" column to the IDs given in id_map.txt.
Table of ContentsThe CVS archive is structured for convenient access by the data-analysis scripts at http://github.com/agoldst/tmhls, with each CSV file stored in its own directory:
elh_ci_all/citations.CSV,
mlr1905-1970/citations.CSV,
mlr1971-2013/citations.CSV,
modphil_all/citations.CSV,
nlh_all/citations.CSV,
pmla_all/citations.CSV,
res1925-1980/citations.CSV,
res1981-2012/citations.CSV.
Data Life Cycle Event(s) Type: Related publication Label: Article discusses experimental outcomes. Note appendix on technical details. Date: 2014 Detail: Goldstone, A., & Underwood, T. "The Quiet Transformations of Literary Studies: What Thirteen
Thousand Scholars Could Tell Us", Rutgers University Community Repository, 2014. DOI: http://dx.doi.org/doi:10.7282/T3222RZT
SubjectsCreative works
Rights statementCopyright for research resources published in RUcore is retained by the copyright holder. By virtue of its appearance in this open access medium, you are free to use this resource, with proper attribution, in educational and other non-commercial settings. Other uses, such as reproduction or republication, may require the permission of the copyright holder.