Research Statement for Fred Bower

Research Overview

My research interests remain based in computer architecture. Specifically, I am interested in studying microarchitectural impacts of multicore architectures. We are presently in the early years of multicore processor production, with homogeneous and statically heterogeneous designs dominating the landscape. As designers strive for more cores per package, they are faced with fundamental cooling limits for air-cooled applications. For homogeneous designs, this means that the number of watts available per core is diminishing. One way to work around this limitation is to use a heterogeneous design, with differing power (and therefore complexity) budgets per core. Another alternative is to use dynamic voltage or frequency scaling to manage package power, allowing some cores to run at peak performance while others are throttled. This technique will yield dynamic heterogeneity of cores. Additionally, it may be advantageous to bin parts per-core, based upon maximum frequency supported by each core (due to process variation) or based upon deconfiguration of faulty parts of the core, using lightweight techniques similar to those I have previously proposed. When we arrive at dynamically heterogeneous design points, I believe that an interesting set of new problems present themselves to the research community. This area is my where I intend to focus my future efforts.

Current Research

My research prior to my dissertation proposal was centered around the development of lightweight hard-fault tolerance techniques for the single-core microprocessor. This work began in 2004 with the development of two hard-fault tolerance techniques for microprocessor array structures, such as the reorder buffer (ROB). One of these techniques was published and presented at The International Conference on Dependable Systems and Networks 2004 (DSN04). A complete overview of this and a second, alternative method was published in Volume 2, Number 4 of IEEE Transactions on Dependable and Secure Computing at the end of 2005. For my Research Initiation Project, I developed a design that exploits the redundancy present in an SMT-capable microprocessor, such as the Pentium 4. This design takes existing low-cost backward error recovery (BER) techniques that have been developed for transient fault tolerance and extends them to provide hard fault tolerance. This work extends the 2004 efforts by protecting a larger portion of the microprocessor core. The protected structures with the current scheme, as published at the 38th International Symposium on Microarchitecture (MICRO-38), encompass logic from instruction issue to retirement. This work was extended to further study the sensitivity of the fault tolerance to various processor designs, as well as to explore the effects of diagnosing singleton resources, such as the floating point multiplier. The extended studies have been accepted for publication in ACM Transactions on Architecture and Code Optimization (TACO). My work on hard fault tolerance led to the development of an architectural vulnerability metric for use in evaluating which structures to harden against hard faults. This work was published as a short paper at ACM SIGMETRICS/Performance 2006 and was presented at the poster session of the conference.

Bibliography

  1. Fred A. Bower, Daniel J. Sorin, and Sule Ozev. “Online Diagnosis of Hard Faults in Microprocessors.” ACM Transactions on Architecture and Code Optimization (TACO), Vol. 4, Issue 2, June 2007.
  2. Fred A. Bower, Derek R. Hower, Mahmut Yilmaz, Daniel J. Sorin, and Sule Ozev. “Applying Architectural Vulnerability Analysis to Hard Faults in the Microprocessor.” ACM SIGMETRICS/Performance 2006. June 2006. Accepted as two-page publication with poster presentation at the conference.
  3. Fred A. Bower, Sule Ozev, and Daniel J. Sorin. “A Mechanism for Online Diagnosis of Hard Faults in Microprocessors.” 38th International Symposium on Microarchitecture (MICRO-38), November 2005, pages 197-208.
  4. Fred A. Bower, Sule Ozev, and Daniel J. Sorin. “Autonomic Microprocessor Execution via Self-Repairing Arrays.” IEEE Transactions on Dependable and Secure Computing. Volume 2, Number 4, October-December 2005, pages 297-310.
  5. Fred A. Bower, Paul G. Shealy, Sule Ozev, and Daniel J. Sorin. “Tolerating Hard Faults in Microprocessor Array Structures.” International Conference on Dependable Systems and Networks (DSN), June 2004, pages 51-60.
  6. Fred Bower. "System Data Collection and Problem Analysis in the Flight Data Recorder Project". IBM Conference on Server & Storage Development to Support Autonomic Computing, October 2003.
  7. Fred A. Bower. “An Evaluation of the Security Features of the WebStore Electronic Commerce Suite.” Technical Report, submitted to Bugtraq, April 1999.