Designing an In-Memory Metadata Cache to Accelerate Object Storage Operations

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Thakkar, Jeet Bharatbhai
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:The Simple Storage Service (S3) protocol has become the de-facto standard for largescale data storage. With the widespread adoption of cloud services, the S3 protocol, which was initially developed by Amazon, has now been quickly adopted by all major vendors with a common set of base functionalities. S3 file operations are performed using a Representational State Transfer (REST) Application Programming Interface (API). This thesis presents the challenges associated with copying large amounts of data across S3 Clusters (both on-premise and in cloud) using native tools such as mc (MinIO Client), rclone, and s3cmd, and proposes the design of an in-memory metadata cache to accelerate S3 operations. The metadata cache first builds the current state of the bucket and persists the operations to the disk using PostgreSQL, and then uses the S3 bucket notification to build an incremental view of changes caused by file operations on the bucket. This solution eliminates the need to rescan the entire contents of the bucket to determine file changes in the source S3 bucket, which is the current standard in replication tools such as rclone. The cache has been developed in golang and tested on an 8-core Turing Pi System On Chip (SoC) module, and impact with performance has been measured. Performance evaluations demonstrate significant reductions in metadata retrieval time to a mere 6 minutes as compared to 4 hours using the standard method of listing objects, making this approach a practical enhancement for on-premise S3 storage solutions.
ISBN:9798265465979
Fuente:ProQuest Dissertations & Theses Global