Export 12 results:
Filters: Author is J. Dongarra  [Clear All Filters]
Conference Paper
Chen, Z., and J. Dongarra, "Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources", 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, 2006.
Bland, W., P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra, "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI", 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012), Christos Kaklamanis, Theodore Papatheodorou and Paul Spirakis eds., Springer-Verlag, Rhodes, Greece, August 27-31, 2012.
Bosilca, G., C. Coti, T. Herault, P. P. Lemarinier, and J. Dongarra, "Constructing Resilient Communication Infrastructure for Runtime Environments", Parallel Computing: From Multicores and GPU's to Petascale: IOS Press, pp. 441-451, 2010.
nd Luszczek, L. H. P., and J. Dongarra, "Profiling High Performance Dense Linear Algebra Algorithms on Multicore Architectures for Power and Energ Efficiency", International Conference on Energy-Aware High Performance Computing (EnA-HPC 2011), Hamburg, Germany, sep, 2011.
Conference Proceedings
Bland, W., A. Bouteiller, T. Herault, J. Hursey, G. Bosilca, and J. Dongarra, "An Evaluation of User-Level Failure Mitigation Support in MPI", roceedings of Recent Advances in Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI, Springer, Vienna, Austria, September 23 - 26, 2012.
Journal Article
Chen, Z., and J. Dongarra, "Algorithm-Based Fault Tolerance for Fail-Stop Failures", IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, 2008.
Dongarra, J., G. Bosilca, R. Delmas, and J. Langou, "Algorithmic Based Fault Tolerance Applied to High Performance Computing", Journal of Parallel and Distributed Computing, vol. 69, no. 4, pp. 410-416, april, 2009.
Langou, J., Z. Chen, G. Bosilca, and J. Dongarra, "Recovery Patterns for Iterative Methods in a Parallel Unstable Environment", SIAM Journal on Scientific Computing, vol. 30, no. 1, pp. 102-116, 2007.
Angskun, T., G. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra, "Self-Healing Network for Scalable Fault-Tolerant Runtime Environments", Future Generation Computer Systems, vol. 26, no. 3, pp. 479-485, mar, 2010.
Ma, T., G. Bosilca, A. Bouteiller, B. Goglin, J. Squyres, and J. Dongarra, Kernel Assisted Collective Intra-node Communication Among Multicore and Manycore CPUs, , no. UT-CS-10-663: Innovative Computing Laboratory, University of Tennessee, nov, 2010.
Bosilca, G., T. Herault, A. Rezmerita, and J. Dongarra, On Scalability for MPI Runtime Systems, , no. ICL-UT-11-05: Innovative Computing Laboratory, University of Tennessee, may, 2011.