Home arrow Get Informed arrow Publications arrow Technical Papers arrow Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults
Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults PDF Print E-mail
Written by Dong Tang, Peter Carruthers, Zuheir Totari, Michael Shapiro   
Thursday, 01 December 2005

IEEE 11th Pacific Rim International Symposium on Dependable Computing (PRDC 2005), December, 2005. [Slides]
One of the Solaris operating system fault management architecture provisions is the automatic memory page retirement (MPR), intended to reduce the negative impact of memory permanent faults that generate either correctable or uncorrectable errors, on system reliability, availability, and serviceability (RAS). The MPR technique allows memory pages suffering from correctable errors to be removed from usage pools without interrupting user applications running on the system. It also allows memory pages suffering from uncorrectable errors to be isolated from usage with limited impact on affected user processes to avoid an outage for the entire system. This study applies analytical models, with parameters calibrated by field experience, to quantify the reduction that can be made by this operating system level self-healing technique, on the system interruptions, yearly downtime, and number of services introduced by hardware permanent faults, for typical low-end and midrange server systems. The results show that significant improvements can be made on these three system RAS metrics by deploying the MPR capability.

Assessment of the Effect of Memory Page Retirement on System RAS Against Hardware Faults


Comments (0)add comment

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

security image
Write the displayed characters


busy
 
< Prev   Next >
impersonal-mites