GPT Semantic Cache: Reducing LLM Costs and Latency via Semantic Embedding Caching

Zapisane w:

Opis bibliograficzny
Wydane w:	arXiv.org (Dec 9, 2024), p. n/a
1. autor:	Regmi, Sajal
Kolejni autorzy:	Pun, Chetan Phakami
Wydane:	Cornell University Library, arXiv.org
Hasła przedmiotowe:	Semantics Large language models Caching Queries Storage Application programming interface Operating costs Artificial intelligence Natural language processing Response time (computers) Customer services Speech recognition
Dostęp online:	Citation/Abstract Full text outside of ProQuest
Etykiety:	Dodaj etykietę Nie ma etykietki, Dołącz pierwszą etykiete!

Napisz pierwszy komentarz!