Follow
Kurt Ferreira
Title
Cited by
Cited by
Year
Detection and correction of silent data corruption for large-scale high-performance computing
D Fiala, F Mueller, C Engelmann, R Riesen, K Ferreira, R Brightwell
SC'12: Proceedings of the International Conference on High Performance …, 2012
3692012
Memory errors in modern systems: The good, the bad, and the ugly
V Sridharan, N DeBardeleben, S Blanchard, KB Ferreira, J Stearley, ...
ACM SIGARCH Computer Architecture News 43 (1), 297-310, 2015
3462015
Evaluating the viability of process replication reliability for exascale systems
K Ferreira, J Stearley, JH Laros III, R Oldfield, K Pedretti, R Brightwell, ...
Proceedings of 2011 International Conference for High Performance Computing …, 2011
3232011
Characterizing application sensitivity to OS interference using kernel-level noise injection
KB Ferreira, P Bridges, R Brightwell
SC'08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 1-12, 2008
3082008
Combining partial redundancy and checkpointing for HPC
J Elliott, K Kharbas, D Fiala, F Mueller, K Ferreira, C Engelmann
2012 IEEE 32nd International Conference on Distributed Computing Systems …, 2012
2022012
Fault-tolerant linear solvers via selective reliability
PG Bridges, KB Ferreira, MA Heroux, M Hoemmen
arXiv preprint arXiv:1206.1390, 2012
962012
Designing and implementing lightweight kernels for capability computing
R Riesen, R Brightwell, PG Bridges, T Hudson, AB Maccabe, PM Widener, ...
Concurrency and Computation: Practice and Experience 21 (6), 793-817, 2009
802009
On the viability of compression for reducing the overheads of checkpoint/restart-based fault tolerance
D Ibtesham, D Arnold, PG Bridges, KB Ferreira, R Brightwell
2012 41st international conference on parallel processing, 148-157, 2012
692012
libhashckpt: hash-based incremental checkpointing using gpu’s
KB Ferreira, R Riesen, R Brighwell, P Bridges, D Arnold
Recent Advances in the Message Passing Interface: 18th European MPI Users …, 2011
662011
The impact of system design parameters on application noise sensitivity
KB Ferreira, PG Bridges, R Brightwell, KT Pedretti
Cluster computing 16, 117-129, 2013
632013
Cooperative application/OS DRAM fault recovery
PG Bridges, M Hoemmen, KB Ferreira, MA Heroux, P Soltero, ...
Euro-Par 2011: Parallel Processing Workshops: CCPI, CGWS, HeteroPar, HiBB …, 2012
492012
Alleviating scalability issues of checkpointing protocols
R Riesen, K Ferreira, D Da Silva, P Lemarinier, D Arnold, PG Bridges
SC'12: Proceedings of the International Conference on High Performance …, 2012
482012
Energy delay product
JH Laros III, K Pedretti, SM Kelly, W Shu, K Ferreira, J Van Dyke, ...
Energy-Efficient High Performance Computing: Measurement and Tuning, 51-55, 2013
462013
Topics on measuring real power usage on high performance computing platforms
JH Laros, KT Pedretti, SM Kelly, JP Vandyke, KB Ferreira, CT Vaughan, ...
2009 IEEE International Conference on Cluster Computing and Workshops, 1-8, 2009
452009
Evaluating energy savings for checkpoint/restart
B Mills, RE Grant, KB Ferreira, R Riesen
Proceedings of the 1st International Workshop on Energy Efficient …, 2013
442013
Using simulation to evaluate the performance of resilience strategies at scale
S Levy, B Topp, KB Ferreira, D Arnold, T Hoefler, P Widener
High Performance Computing Systems. Performance Modeling, Benchmarking and …, 2014
392014
Transparent redundant computing with MPI
R Brightwell, K Ferreira, R Riesen
Recent Advances in the Message Passing Interface: 17th European MPI Users …, 2010
372010
Increasing fault resiliency in a message-passing environment
K Ferreira, R Riesen, R Oldfield, J Stearley, J Laros, K Pedretti, ...
Sandia National Laboratories, Technical report SAND2009-6753, 2009
372009
On the viability of checkpoint compression for extreme scale fault tolerance
D Ibtesham, D Arnold, KB Ferreira, PG Bridges
Euro-Par 2011: Parallel Processing Workshops: CCPI, CGWS, HeteroPar, HiBB …, 2012
362012
Redundant computing for exascale systems.
JR Stearley, RE Riesen, JH Laros III, KB Ferreira, KTT Pedretti, ...
Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA …, 2010
362010
The system can't perform the operation now. Try again later.
Articles 1–20