New analytical approaches, like topic modeling, can illuminate subtle transformations, revealing concepts, frequently taken for granted, to be more variable than scholars have assumed. In this study, the corpus that was modeled included 21,367 JSTOR articles and 13,221 distinct author names resulting in the 150-topic model.
The four files supporting this study and available here are: 1) vocab.txt: UTF-8 text, one word per line, giving all 98835 word types included in the model. The list of stop words excluded from this vocabulary is given at https://www.ideals.illinois.edu/handle/2142/45709, 2) id_map.txt: UTF-8 text, one string per line, giving JSTOR ID strings of all 23167 documents included in the model, in the order indexed by the sampling state file, 3) mallet_state.gz (370MB): gzip'd UTF-8 text representing the final sampling state output by MALLET. Each token of the input documents is represented by a single line, with six white-space delimited fields: document index, document label (unused), token index, word type index, word type as a string, topic index. The word type index is zero-based and corresponds to the order in vocab.txt. The document index is zero-based and corresponds to the order in id_map.txt, and 4) metadata.tar.gz (3.9MB): gzip'd tar archive of 8 CSV files containing metadata for the documents modeled. Metadata for documents in the model can be located by matching the "id" column to the IDs given in id_map.txt.
Extension
DescriptiveEvent
Type
Related publication
Label
Article discusses experimental outcomes. Note appendix on technical details.
Goldstone, A., & Underwood, T. "The Quiet Transformations of Literary Studies: What Thirteen
Thousand Scholars Could Tell Us", Rutgers University Community Repository, 2014. DOI:
http://dx.doi.org/doi:10.7282/T3222RZT
OriginInfo
DateCreated (point = start); (encoding = iso8601)
2014
Place
PlaceTerm (type = code)
TableOfContents
The CVS archive is structured for convenient access by the data-analysis scripts at http://github.com/agoldst/tmhls, with each CSV file stored in its own directory:
elh_ci_all/citations.CSV,
mlr1905-1970/citations.CSV,
mlr1971-2013/citations.CSV,
modphil_all/citations.CSV,
nlh_all/citations.CSV,
pmla_all/citations.CSV,
res1925-1980/citations.CSV,
res1981-2012/citations.CSV.
RelatedItem (type = host)
TitleInfo
Title
Literary Studies Topic Modeling: Metadata and Modeling Outputs
Identifier (type = local)
rucore00000002286
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T35B00V9
Back to the top
Rights
RightsDeclaration (AUTHORITY = RU_Research); (ID = RU_Research001)
Copyright for research resources published in RUcore is retained by the copyright holder. By virtue of its appearance in this open access medium, you are free to use this resource, with proper attribution, in educational and other non-commercial settings. Other uses, such as reproduction or republication, may require the permission of the copyright holder.