Theoretical and Experimental Studies on Strongly Adaptive Filters and Parallel Paging

Guardado en:

Detalles Bibliográficos
Publicado en:	ProQuest Dissertations and Theses (2025)
Autor principal:	Mo, Tianchi
Publicado:	ProQuest Dissertations & Theses
Materias:	Computer science Computer engineering Electrical engineering
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	This dissertation theoretically and experimentally studies two algorithmic areas: the strongly adaptive filter and parallel paging.The first focus of this dissertation is bounding the reduction of false positives by strongly adaptive filters when the membership queries are skewed. Recent work has investigated adaptive filters, which are filters that change their internal representation in response to queries that yield false positives. These include: (1) strongly adaptive filters, which guarantee a false-positive probability of at most ε for any query regardless of the history of prior queries, i.e., against adaptive adversaries, (2) support-optimal filters, which guarantee an average false-positive probability of at most ε over sufficiently large query sequences, when the adversary is oblivious, (3) other adaptive filters that change their representation and empirically perform better, but do not come with any specific provable guarantees beyond static filters.This dissertation investigates the performance advantages that strongly adaptive filters offer on (non-adversarial) skewed query distributions, which are common in database applications.In our theoretical and experimental results, we model query distribution skewness with the Zipfian distribution with parameter z. We consider two strongly adaptive filters: the broom filter and the telescoping adaptive filter (TAF). We also consider two adaptive (but not strongly adaptive) filters: the adaptive cuckoo filter (ACF), and a non-adaptive rank-and-select quotient filter augmented with a cache of recent false positives, which we call the cache-augmented filter (CAF).We prove upper bounds on the false-positive rates of the broom filter, the TAF, and the CAF as a function of the Zipfian parameter z as the length of the query sequence tends to infinity.We provide an implementation of the broom filter, based on the (non-adaptive) rank-and-select quotient filter. We validate the above bounds experimentally on synthetic Zipfian query sequences on the broom filter, the TAF, and the CAF.Finally, we measure the observed false-positive rate of the broom filter, the TAF, the CAF, and the ACF on highly skewed real-world network trace data. We find that all adaptive filters achieved 1-2 orders of magnitude lower false-positive rates than non-adaptive filters. We further find that the broom filter and the TAF outperform the CAF only when the ratio of distinct negative queries to positive set size is high; otherwise, the CAF and the strongly adaptive filters yield similar false-positive rates.The second focus of this dissertation is applying machine-learned (ML) advice to the green paging and parallel paging problems. Modern operating systems, like Linux, do not explicitly partition the fast memory (like RAM) among processes running in parallel, which causes thrashing when a sufficiently large number of processes compete heavily for the fast memory.This phenomenon motivates us to study the parallel paging problem and the green paging problem. In the parallel paging problem, we dynamically partition a fast memory of limited capacity k among p processors with the goal of minimizing their mean completion time. In the green paging problem, we dynamically allocate memory to a single processor with the goal of minimizing its memory impact, i.e., the integral of its memory consumption over time, to save energy. Recent work demonstrates that these two problems are intrinsically linked: a green paging algorithm with a competitive ratio of β on memory impact can be transformed into a parallel paging algorithm with a competitive ratio of O(β) on mean completion time. There is a universal memory allocation algorithm for green paging called BLIND that is Φ(log p)-competitive.To achieve better performance for both green and parallel paging problems, we first design a dynamic programming algorithm to determine the offline optimal compartmentalized box profile for green paging memory allocation, which costs O(n log(pn)) time and O(n) space, where n is the length of the single request sequence. This algorithm can be used to generate datasets for training machine learning models that advise fast memory allocation to a processor for green paging purposes online. Since the ML advice may be imperfect, we propose an ML-advised green paging framework that combines an ML advisor with BLIND to bound the framework’s worst-case performance. The framework keeps simulating BLIND in the background. It periodically checks whether the memory impact actually consumed by the processor is less than or equal to (1 + δ) times the memory impact calculated by the background BLIND, where δ =O(1) is a tunable parameter. If so, the ML advisor is given control over memory allocation; otherwise, the framework gives control to BLIND. We then transform our green paging algorithm framework into a parallel paging algorithm framework.We implemented our green paging framework using XGBoost advisors. In the simulation experiment on the 2nd Cache Replacement Championship traces, as compared to BLIND, the framework reduces memory impact by 25-71% for the green paging problem and, for parallel paging, the framework reduces mean completion time by 5-58%.
ISBN:	9798270250539
Fuente:	ProQuest Dissertations & Theses Global