Publications
"Special issue on automatic application tuning for HPC architectures",
Scientific Programming, vol. 22, pp. 259–260, 2014.
"SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application to a Large Scale Subsurface Simulator",
IEEE Conference on High Performance Computing (HiPC), Bengaluru, India, 12/2013.
"A script-based autotuning compiler system to generate high-performance cuda code.",
ACM Trans. Archit. Code Optim (TACO), vol. 9(4), pp. 31:1-31:25, 01/2013.
"Strategies for Energy-Efficient Resource Management of Hybrid Programming Models",
IEEE Trans. Parallel Dist. Syst., vol. 24, issue 1, pp. 144-157, January,2013.
"A script-based autotuning compiler system to generate high-performance CUDA code",
ACM Transcations on Architectures and Code Optimization (TACO) , January 2013, vol. 9, issue 4, 2012.
"SPAPT: Search Problems in Automatic Performance Tuning",
In Proceedings of the International Conference on Computational Science (ICCS 2012), vol. Procedia Computer Science, no. ANL/MCS-P1872-0411, pp. 1959--1968, 2012.
"Spin-orbit configuration interaction calculations of electronic spectra of RuO2+ and OsO2+ catalytic cores",
Southwest Regional Meeting of the American Chemical Society (SWRMACS 2012), Baton Rouge, LA, 2012.
"Studying the impact of application-level optimizations on the power consumption of multi-core architectures",
Computing Frontiers Conference, 15 - 17 May 2012, Cagliari , Italy, Association for Computing Machinery , 2012.
"Studying The Impact Of Application-level Optimizations On The Power Consumption Of Multi-Core Architectures",
ACM International Conference on Computing Frontiers 2012 (CF'12), Cagliari, Italy, May 15th - 17th, 2012.
"System-wide Introspection for Accurate Attribution of Performance Bottlenecks",
Workshop on High-performance Infrastructure for Scalable Tools (WHIST), Venice, Italy, 06/2012.
On Scalability for MPI Runtime Systems,
, no. ICL-UT-11-05: Innovative Computing Laboratory, University of Tennessee, may, 2011.
"Scheduling Task Parallelism on Multi-Socket Multicore Systems",
International Workshop on Runtime and Operating Systems for Supercomputers, Tuson, AZ, USA, {ACM}, June, 2011.
"A Scalable and Distributed Dynamic Formal Verifier for MPI Programs",
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis SC '10: IEEE Computer Society Washington, DC, pp. 1-10, nov, 2010.
"Self-Healing Network for Scalable Fault-Tolerant Runtime Environments",
Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, mar, 2010.
"SoftPower: Fine-Grain Power Estimations Using Performance Counters",
The ACM International Symposium on High Performance Distributed Computing (HPDC), Chicago, IL, ACM, pp. 308-311, 2010.