In-database query optimization on SQL with ML predicates
Kaydedildi:
| Yayımlandı: | The VLDB Journal vol. 34, no. 1 (Jan 2025), p. 12 |
|---|---|
| Yazar: | |
| Diğer Yazarlar: | , , |
| Baskı/Yayın Bilgisi: |
Springer Nature B.V.
|
| Konular: | |
| Online Erişim: | Citation/Abstract Full Text Full Text - PDF |
| Etiketler: |
Etiket eklenmemiş, İlk siz ekleyin!
|
| Özet: | Extended SQL with machine learning (ML) predicates, commonly referred to as SQL+ML, integrates ML abilities into traditional SQL processing in databases. When processing SQL+ML queries, some methods move data from database (DB) systems to ML systems to support SQL+ML queries. Such methods are not only costly due to maintaining two copies of data, but also pose security risks due to data movement. Fortunately, in-database SQL+ML processing addresses these limitations. However, conventional DB optimizers take ML predicates as UDFs (user-defined functions) and cannot optimize them using query rewriter and cost models. To boost the efficiency of in-database SQL+ML processing, this paper proposes to generate SQL predicates based on ML predicates and add them into SQL+ML queries, which can prune a significant number of irrelevant tuples and thus improve the performance. Optimizing SQL+ML queries presents three challenges: (C1) how to generate valid SQL predicates, (C2) how to select high-quality SQL predicates, and (C3) how to optimize the query using SQL predicates. To address these challenges, we propose Smart, which integrates three novel modules into the database optimizer: (1) inference rewrite: generating tight and valid SQL predicates for logical optimization; (2) progressive inference: selecting high-pruning-power but low-overhead SQL predicates to prune irrelevant tuples; (3) cost-optimal inference: optimizing the cost of query plan with selected SQL predicates for physical optimization. We implemented Smart in PostgreSQL and evaluated it on four widely-used benchmarks, JOB, TPC-H, SSB, and Flight. Experimental results revealed that Smart performed up to three orders of magnitude faster than the state-of-art baselines. |
|---|---|
| ISSN: | 1066-8888 0949-877X |
| DOI: | 10.1007/s00778-024-00888-3 |
| Kaynak: | Advanced Technologies & Aerospace Database |