Staff View
Orchestrating On-Chip Memory Resources for Throughput-Oriented Compilation

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
10 p.
Note (type = special display note)
Technical report DCS-TR-699
Name (type = corporate); (authority = RutgersOrg-School)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (type = corporate); (authority = RutgersOrg-Department)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
Orchestrating On-Chip Memory Resources for Throughput-Oriented Compilation
Abstract (type = abstract)
A key factor in GPU performance efficiency is the number of active threads that can run simultaneously on each streaming multi-processor. The active threads have their states saved on fast memory devices and can quickly be scheduled to run if the set of running threads stalls due to memory latency. The greater number of active threads we have, the higher utilization we can obtain from many-core processor pipelines. To achieve optimal utilization, we typically need many more active threads than the number of physical cores. Due to limited on-chip memory resources including registers and scratch-pad memory, and the fact that every thread gets a equal partition of on-chip memory resource, the number of active threads depends on the characteristics of a given program and the back-end compilation efficiency in resource allocation. When a large and complicated program requires more registers per thread, the program performance may degrade significantly due to the decrease in the total number of active threads. In this paper, we propose a novel resource allocation approach for back-end compilation of throughput GPU processors. This approach leverages on-chip scratch-pad memory to reduce register pressure and increase GPU processor occupancy for maximum throughput. The scratch-pad memory serves as middle layer between register and long-latency off-chip memory. On one hand, it reduces register usage per-thread. On the other hand, it can serve as a caching layer for variables that need to be staged into registers from global memory. We have formulated the resource allocation problem for optimal utilization and throughput of many-core processors, and proposed efficient models and techniques. We implemented these techniques in a binary optimizer, and evaluated it on a set of realistic benchmarks on real GPUs. We demonstrated the effectiveness of our techniques by achieving up to 1.65 times speedup compared to the programs compiled by nvcc with highest optimization flag.
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = approximate); (keyDate = yes)
2012
Name (type = personal)
NamePart (type = family)
Ames
NamePart (type = given)
Jeff
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Zhan
NamePart (type = given)
Ying
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Zhang
NamePart (type = given)
Eddy Z.
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3J969WT
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.5
ApplicationName
pdfTeX-1.40.12
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2013-01-23T07:39:27
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2013-01-23T07:39:27
Back to the top
Version 8.3.10
Rutgers University Libraries - Copyright ©2019