Staff View
Architectural Support for Address Translation on GPUs

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
14 p.
Note (type = special display note)
Technical report DCS-TR-703
Name (type = corporate); (authority = RutgersOrg-School)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (type = corporate); (authority = RutgersOrg-Department)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
Architectural Support for Address Translation on GPUs
Abstract (type = abstract)
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. Indeed, processor vendors have already begun embracing heterogeneous systems with unified address spaces (e.g., Intel’s Haswell, AMD’s Berlin processor, and ARM’s Mali and Cortex cores). We are the first to explore GPU Translation Lookaside Buffers (TLBs) and page table walkers for address translation in the context of shared virtual memory for heterogeneous systems. To exploit the programmability benefits of shared virtual memory, it is natural to consider mirroring CPUs and placing TLBs prior (or parallel) to cache accesses,
making caches physically addressed. We show the performance challenges of such an approach and propose modest hardware augmentations to recover much of this lost performance.We then consider the impact of this approach on the design of general purpose GPU performance improvement schemes. We look at: (1) warp scheduling to increase cache hit rates; and (2) dynamic warp formation to mitigate control flow divergence overheads. We show that introducing cache-parallel address translation does pose challenges, but that modest optimizations can buy back much of this lost performance. Overall, this paper explores address translation mechanisms on GPUs. While cache-parallel address translation does introduce non-trivial performance overheads, modestly TLB-aware designs can move overheads into a range deemed acceptable in the CPU world (5-15% of runtime). We presume this initial design leaves room for improvement but hope the larger result, that a little TLB-awareness goes a long way in GPUs, spurs future work in this fruitful area.
Name (type = personal)
NamePart (type = family)
Pichai
NamePart (type = given)
Bharath
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Hsu
NamePart (type = given)
Lisa
Affiliation
Qualcomm Research
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Bhattacharjee
NamePart (type = given)
Abhishek
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = exact); (keyDate = yes)
2013-07
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3C53Q93
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.4
ApplicationName
GPL Ghostscript 9.05
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2013-07-27T16:49:35
DateCreated (point = start); (encoding = w3cdtf); (qualifier = exact)
2013-07-27T16:49:35
Back to the top
Version 8.3.10
Rutgers University Libraries - Copyright ©2019