Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios

Salvato in:
Dettagli Bibliografici
Pubblicato in:arXiv.org (Dec 19, 2024), p. n/a
Autore principale: Shibaev, Egor
Altri autori: Sushentsev, Denis, Golubev, Yaroslav, Khvorov, Aleksandr
Pubblicazione:
Cornell University Library, arXiv.org
Soggetti:
Accesso online:Citation/Abstract
Full text outside of ProQuest
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3147563490
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3147563490 
045 0 |b d20241219 
100 1 |a Shibaev, Egor 
245 1 |a Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios 
260 |b Cornell University Library, arXiv.org  |c Dec 19, 2024 
513 |a Working Paper 
520 3 |a In large-scale software systems, there are often no fully-fledged bug reports with human-written descriptions when an error occurs. In this case, developers rely on stack traces, i.e., series of function calls that led to the error. Since there can be tens and hundreds of thousands of them describing the same issue from different users, automatic deduplication into categories is necessary to allow for processing. Recent works have proposed powerful deep learning-based approaches for this, but they are evaluated and compared in isolation from real-life workflows, and it is not clear whether they will actually work well at scale. To overcome this gap, this work presents three main contributions: a novel model, an industry-based dataset, and a multi-faceted evaluation. Our model consists of two parts - (1) an embedding model with byte-pair encoding and approximate nearest neighbor search to quickly find the most relevant stack traces to the incoming one, and (2) a reranker that re-ranks the most fitting stack traces, taking into account the repeated frames between them. To complement the existing datasets collected from open-source projects, we share with the community SlowOps - a dataset of stack traces from IntelliJ-based products developed by JetBrains, which has an order of magnitude more stack traces per category. Finally, we carry out an evaluation that strives to be realistic: measuring not only the accuracy of categorization, but also the operation time and the ability to create new categories. The evaluation shows that our model strikes a good balance - it outperforms other models on both open-source datasets and SlowOps, while also being faster on time than most. We release all of our code and data, and hope that our work can pave the way to further practice-oriented research in the area. 
653 |a Datasets 
653 |a Source code 
653 |a Open source software 
653 |a Time measurement 
653 |a Computer programming 
700 1 |a Sushentsev, Denis 
700 1 |a Golubev, Yaroslav 
700 1 |a Khvorov, Aleksandr 
773 0 |t arXiv.org  |g (Dec 19, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3147563490/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.14802