Dram Errors in Enterprise Storage Systems: Probabilistic Modeling and Mitigations

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Mosayebibehrooz, Nika
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3234680447
003 UK-CbPIL
020 |a 9798290664705 
035 |a 3234680447 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Mosayebibehrooz, Nika 
245 1 |a Dram Errors in Enterprise Storage Systems: Probabilistic Modeling and Mitigations 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Memory reliability is a critical concern in modern computing systems, where DRAM errors can significantly impact performance and data integrity. Systems employ Error Correction Codes (ECC) as a protection mechanism against memory errors, but these mechanisms are not capable of correcting all errors. Uncorrectable errors at this stage present a significant challenge in DRAM systems as they result in degraded performance and reliability and require costly memory replacements.To address this, newer mitigation mechanisms have been developed. However, existing research on their effectiveness has primarily focused on operating-system-level mechanisms such as page offlining, and studies on hardware-targeted mechanisms including Post-Package Repair (PPR), and Adaptive Double Device Data Correction (ADDDC) have been very limited. Additionally, while these actions incur performance and resource overhead, the optimal conditions and timing for triggering them have remained unexplored.We aim to fill this gap by modeling error dynamics with spatial information about error locations, moving towards the ability to predict uncorrectable errors and other events which lead to DRAM replacement, and select the most efficient mitigation action tailored to each unique situation. By leveraging a rich dataset collected from a substantial population of enterprise storage systems, this work provides invaluable insights into the real-world behavior of memory errors, and establishes a foundation for optimized application of error mitigation strategies, which results in enhanced reliability and performance in storage systems. 
653 |a Computer engineering 
653 |a Computer science 
653 |a Information science 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3234680447/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3234680447/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch