"Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters", The Journal of Supercomputing, vol. 62, issue 2, pp. 946 - 966, April 2012.
"A script-based autotuning compiler system to generate high-performance CUDA code", ACM Transcations on Architectures and Code Optimization (TACO) , January 2013, vol. 9, issue 4, 2012.
"A programming language interface to describe transformations and code generation", Proceedings of the 23rd international conference on Languages and compilers for parallel computing, Houston, TX, Springer-Verlag, 2011.
"Loop Transformation Recipes for Code Generation and Auto-Tuning", Proceedings of the Workshop on Languages and Compilers for Parallel Computing, oct, 2009.
"A scalable auto-tuning framework for compiler optimization", Proceedings of the International Parallel and Distributed Processing Symposium, apr, 2009.