Staff View
Using Massive Atomics Operations for Massively Parallel GPU Applications: Inevitable or Indispensable?

Descriptive

Language
LanguageTerm (authority = ISO 639-3:2007); (type = text)
English
Genre (authority = RULIB-FS)
Other
Genre (authority = marcgt)
technical report
PhysicalDescription
InternetMediaType
application/pdf
Extent
10 p.
Note (type = special display note)
Technical report DCS-TR-700
Name (type = corporate); (authority = RutgersOrg-School)
NamePart
School of Arts and Sciences (SAS) (New Brunswick)
Name (type = corporate); (authority = RutgersOrg-Department)
NamePart
Computer Science (New Brunswick)
TypeOfResource
Text
TitleInfo
Title
Using Massive Atomics Operations for Massively Parallel GPU Applications: Inevitable or Indispensable?
Abstract (type = abstract)
The parallelization process for a sequential applications involves handling of concurrent shared memory object updates. One important type of parallelism is exploited when the order of the memory updates to the same location does not change the output of the program. This type of parallelism is reduction type parallelism. It typically exists in many important applications such as data mining, numerical analysis and scientific simulation. The implementation of these applications for multi-core architectures is typically accomplished by using thread(s) private data objects to hold partial results and applying a sequential final stage to aggregate the partial results. Porting this type of applications to massively parallel GPU processors faces new challenges. One major challenge is work partitioning, the target of which is minimal communication between individual threads that run in parallel and less work in the sequential reduction stage to aggregate all partial results. However, when the number of threads explodes to thousands of or millions of, the workload partitioning becomes much more complicated than that of less than ten threads. This may ultimately lead to load unbalancing or extra control code for handling boundary cases in irregular applications. Extra control code, may on one hand lead to increased coding complexity, and on the other hand, runtime thread divergence and thus serious performance degradation. In this paper, we propose a novel approach to handle concurrent shared memory objects on GPUs. This approach not only yields good performance, but also good programmability. This approach uses atomic operations extensively. Atomic operations for shared memory updates are known to be expensive and may cause serialization for parallel threads. However, we discovered that, with appropriate atomic collision elimination techniques, we can achieve similar performance or even better performance than the traditional non-atomics involved implementation. We implemented these techniques as a library of functions with simple interface. The programmers can call these procedures to perform shared memory object updates without worrying about the order of these operations or workload balancing, while achieving significant performance gains brought by the massive parallelism in GPUs.
OriginInfo
DateCreated (encoding = w3cdtf); (qualifier = exact); (keyDate = yes)
2013-09
Name (type = personal)
NamePart (type = family)
Egielski
NamePart (type = given)
Ian
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Zhang
NamePart (type = given)
Chi
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
Name (type = personal)
NamePart (type = family)
Zhang
NamePart (type = given)
Eddy Z.
Affiliation
Computer Science (New Brunswick)
Role
RoleTerm (type = text); (authority = marcrt)
author
RelatedItem (type = host)
TitleInfo
Title
Computer Science (New Brunswick)
Identifier (type = local)
rucore21032500001
Location
PhysicalLocation (authority = marcorg); (displayLabel = Rutgers, The State University of New Jersey)
NjNbRU
Identifier (type = doi)
doi:10.7282/T3DJ5K42
Back to the top

Rights

RightsDeclaration (AUTHORITY = rightsstatements.org); (TYPE = IN COPYRIGHT); (ID = http://rightsstatements.org/vocab/InC/1.0/)
This Item is protected by copyright and/or related rights.You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.For other uses you need to obtain permission from the rights-holder(s).
Copyright
Status
Copyright protected
Availability
Status
Open
Reason
Permission or license
Back to the top

Technical

RULTechMD (ID = TECHNICAL1)
ContentModel
Document
CreatingApplication
Version
1.5
ApplicationName
pdfTeX-1.40.14
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2013-09-16T00:09:40
DateCreated (point = end); (encoding = w3cdtf); (qualifier = exact)
2013-09-16T00:09:40
Back to the top
Version 8.3.10
Rutgers University Libraries - Copyright ©2019