Staff View
Human Performance on Clustering Web Pages

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
19 p.
Note (type = special display note)
Technical report DCS-TR-355
Name (authority = RutgersOrg-School); (type = corporate)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (authority = RutgersOrg-Department); (type = corporate)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
Human Performance on Clustering Web Pages
Abstract (type = abstract)
With the increase in information on the World Wide Web it has become difficult to find desired information quickly without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order to better understand this task, we performed an initial study of human clustering of web pages, in the hope that it would provide some insight into the difficulty of automating this task. Our results show that subjects did not cluster identically; in fact, on average, any two subjects had little similarity in their web-page clusters. We also found that subjects generally created rather small clusters, and those with access only to URLs created fewer clusters than those with access to the full text of each web page. Generally the overlap of documents between clusters for any given subject increased when given the full text, as did the percentage of documents clustered. When analyzing individual subjects, we found that each had different behavior across queries, both in terms of overlap, size of clusters, and number of clusters. These results provide a sobering note on any quest for a single clearly correct clustering method for web pages.
Name (type = personal)
NamePart (type = family)
Macskassy
NamePart (type = given)
Sofus A.
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (authority = marcrt); (type = text)
author
Name (type = personal)
NamePart (type = family)
Banerjee
NamePart (type = given)
Arunava
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (authority = marcrt); (type = text)
author
Name (type = personal)
NamePart (type = family)
Davison
NamePart (type = given)
Brian D.
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (authority = marcrt); (type = text)
author
Name (type = personal)
NamePart (type = family)
Hirsh
NamePart (type = given)
Haym
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (authority = marcrt); (type = text)
author
OriginInfo
DateCreated (encoding = w3cdtf); (keyDate = yes); (qualifier = exact)
1998-08
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3PV6PZ1
Genre (authority = ExL-Esploro)
Technical Documentation
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.4
ApplicationName
GPL Ghostscript 9.07
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:28:15
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2018-06-06T12:28:15
Back to the top
Version 8.3.13
Rutgers University Libraries - Copyright ©2020