Publications Involving RADL Members
Note: You may need access permissions from the publisher to see linked documents.
2012
- The Case for GPGPU Spatial Multitasking, HPCA 2012
- Challenges in Heterogeneous Die-Stacked and Off-Chip Memory Systems, SHAW-3 2012
- Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems, ISPASS 2012
- Cost-Effective Power Delivery to Support Per-core Voltage Domains for Power-constrained Processors, DAC 2012
- Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems, ISCA 2012
- Scalable Multi-Precision Simulation of Spiking Neural Networks on GPU with OpenCL, IJCNN 2012
Efficient Image Re-Ranking Computation on GPUs, ISPA 2012
- Something Old and Something New: P-States Can Borrow Microarchitecture Techniques Too, ISLPED 2012
- Energy-efficient GPU design with reconfigurable in-package graphics memory, ISLPE 2012
- NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories, 2012 IEEE Computer Society Annual Symposium on VLSI
- Can GPGPU Programming Be Liberated from the Data-Parallel Bottleneck?, IEEE Computer Society 2011
- Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems, ISPASS 2012
2011
- Energy-efficient floating-point arithmetic for digital signal processors, Asilomar 2011
- Truncated-Matrix Multipliers with Coefficient Shifting, Asilomar 2011
- Efficiently enabling conventional block sizes for very large die-stacked DRAM caches, MICRO 2011
- A register-file approach for row buffer caches in Die-Stacked DRAMs, MICRO 2011
- Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication, MICRO 2011
- Thread-aware Dynamic Shared Cache Compression in Multicore Processors, ICCD 2011
- Structure-Constrained Microcode Compression, SBAC-PAD 2011
- Hardware Designs for Binary Integer Decimal-Based Rounding, IEEE Transactions on Computers 2011
- Improving the Throughput of Power-Constrained GPUs through Dynamic Voltage/ Frequency and Core Scaling, PACT 2011
- A Decimal Floating-point Fused Multiply-Add Unit with a Novel Decimal Leading-Zero Anticipator, ASAP 2011
- Energy-efficient Floating-point Arithmetic for Software Defined Radio Architectures, ASAP 2011
- Instructions and Hardware Designs for Accelerating SNOW 3G on a Software-defined Radio Platform, Journal of Analog Integrated Circuits and Signals
- CORDIC Instructions for LDPC Decoding on SDR Platforms, Journal of Analog Integrated Circuits and Signals
- Modular High-Throughput and Low-Latency Sorting Units for FPGAs in the Large Hadron Collider, SASP 2011
- Analyzing the Performance and Energy Impact of 3D Memory Integration on Embedded DSPs, SAMOS 2011
- Scratchpad Memory Optimizations for Digital Signal Processing Applications, DATE 2011
- LAR-CC: Large Atomic Regions with Conditional Commits, CGO 2011
- Dimetrodon: Processor-level Preventive Thermal Management via Idle Cycle Injection, DAC 2011
- The gem5 Simulator, Computer Architecture News, May 2011
2010
- Voltage Smoothing: Characterizing and Mitigating Voltage Noise in a Production Processor Using Software-Guided Thread Scheduling, to be published in MICRO 2010
- ASF: AMD64 Extension for Lock-free Data Structures and Transactional Memory, to be published in MICRO 2010
- An x86-64 Core Implemented in 32nm SOI CMOS, published in ISSCC 2010
- Power Gating, ISSCC 2010 Tutorial
- Implementing AMD's Advanced Synchronization Facility in an Out-of-order x86 Core, published in TRANSACT 2010
- Compilation of Thoughts about AMD Advanced Synchronization Facility and First-Generation Hardware Transactional Memory Support, published in TRANSACT 2010
- Evaluation of AMD's Advanced Synchronization Facility within a Complete Transactional Memory Stack, published in Eurosys 2010
2009