"Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources", 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, 2006.
"Algorithm-Based Fault Tolerance for Fail-Stop Failures", IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, 2008.
"Recovery Patterns for Iterative Methods in a Parallel Unstable Environment", SIAM Journal on Scientific Computing, vol. 30, no. 1, pp. 102-116, 2007.