In-database query optimization on SQL with ML predicates

Kaydedildi:

Detaylı Bibliyografya
Yayımlandı:	The VLDB Journal vol. 34, no. 1 (Jan 2025), p. 12
Yazar:	Guo, Yunyan
Diğer Yazarlar:	Li, Guoliang, Hu, Ruilin, Wang, Yong
Baskı/Yayın Bilgisi:	Springer Nature B.V.
Konular:	Databases Machine learning Regression analysis Algorithms Decision trees Queries Costs Optimization techniques Query languages Optimization Inference
Online Erişim:	Citation/Abstract Full Text Full Text - PDF
Etiketler:	Etiketle Etiket eklenmemiş, İlk siz ekleyin!

Diğer Bilgiler
Özet:	Extended SQL with machine learning (ML) predicates, commonly referred to as SQL+ML, integrates ML abilities into traditional SQL processing in databases. When processing SQL+ML queries, some methods move data from database (DB) systems to ML systems to support SQL+ML queries. Such methods are not only costly due to maintaining two copies of data, but also pose security risks due to data movement. Fortunately, in-database SQL+ML processing addresses these limitations. However, conventional DB optimizers take ML predicates as UDFs (user-defined functions) and cannot optimize them using query rewriter and cost models. To boost the efficiency of in-database SQL+ML processing, this paper proposes to generate SQL predicates based on ML predicates and add them into SQL+ML queries, which can prune a significant number of irrelevant tuples and thus improve the performance. Optimizing SQL+ML queries presents three challenges: (C1) how to generate valid SQL predicates, (C2) how to select high-quality SQL predicates, and (C3) how to optimize the query using SQL predicates. To address these challenges, we propose Smart, which integrates three novel modules into the database optimizer: (1) inference rewrite: generating tight and valid SQL predicates for logical optimization; (2) progressive inference: selecting high-pruning-power but low-overhead SQL predicates to prune irrelevant tuples; (3) cost-optimal inference: optimizing the cost of query plan with selected SQL predicates for physical optimization. We implemented Smart in PostgreSQL and evaluated it on four widely-used benchmarks, JOB, TPC-H, SSB, and Flight. Experimental results revealed that Smart performed up to three orders of magnitude faster than the state-of-art baselines.
ISSN:	1066-8888 0949-877X
DOI:	10.1007/s00778-024-00888-3
Kaynak:	Advanced Technologies & Aerospace Database