Export 12 results:
Filters: First Letter Of Last Name is B [Clear All Filters]
"Automatic fault characterization via abnormality-enhanced classification", Dependable Systems and Networks (DSN- W ) , 25-28 June 2012, Boston , Massachusetts - USA, IEEE, 2012.
"Soft Error Vulnerability of Iterative Linear Algebra Methods", Proceedings of the 22nd annual International Conference on Supercomputing, New York, NY, pp. 155-164, 2008.
"AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks", 2010 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Chicago, IL, pp. 231 -240, 2010.
"Enabling Fair Pricing on HPC Systems with Node haring", SC'13, Denver, Colorado, November, 2013.
"Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures, and Accuracy", ACM Transactions on Parallel Computing, vol. 1, pp. 10:1-10:28, 02/2015.
"Performance portability of a GPU enabled factorization with the DAGuE framework", IEEE Cluster Workshop on Parallel Programming on Accelerator Clusters (PPAC), Austin, Texas, September, 2011.
"Correlated set coordination in fault tolerant message logging protocols for many-core clusters", Concurrency and Computation: Practice and Experience, vol. 25, issue 4, pp. 572-585, March, 2013.
"From Serial Loops to Parallel Execution on Distributed Systems", Euro -Par International Conference on parallel processing ,27-31 August 2012, Rhodes Island, Greece, Springer Berlin Heidelberg, 2012.
"Constructing Resilient Communication Infrastructure for Runtime Environments", Parallel Computing: From Multicores and GPU's to Petascale: IOS Press, pp. 441-451, 2010.
On Scalability for MPI Runtime Systems, , no. ICL-UT-11-05: Innovative Computing Laboratory, University of Tennessee, may, 2011.
"Composing Resilience Techniques: ABFT, Periodic, and Incremental Checkpointing", International Journal of Networking and Computing, vol. 5, pp. 2-15, 01/2015.
"A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI", Euro -Par International Conference on parallel processing ,27-31 August 2012, Rhodes Island, Greece, Springer Berlin Heidelberg, 2012.
"A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI", 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012), Christos Kaklamanis, Theodore Papatheodorou and Paul Spirakis eds., Springer-Verlag, Rhodes, Greece, August 27-31, 2012.
"An Evaluation of User-Level Failure Mitigation Support in MPI", roceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI, Springer, Vienna, Austria, September 23 - 26, 2012.