Distributed Analysis in Production with RDataFrame

I tiakina i:
Ngā taipitopito rārangi puna kōrero
I whakaputaina i:EPJ Web of Conferences vol. 337 (2025)
Kaituhi matua: Czurylo, Marta
Ētahi atu kaituhi: Padulano, Vincenzo Eduardo, Piparo, Danilo, Andrea Maria Ola Mejicanos
I whakaputaina:
EDP Sciences
Ngā marau:
Urunga tuihono:Citation/Abstract
Full Text - PDF
Ngā Tūtohu: Tāpirihia he Tūtohu
Kāore He Tūtohu, Me noho koe te mea tuatahi ki te tūtohu i tēnei pūkete!
Whakaahuatanga
Whakarāpopotonga:The ROOT software package provides the data format used in High Energy Physics by the LHC experiments. ROOT offers a data analysis interface called RDataFrame, which has proven to adapt well to the requirements of modern physics analyses. However, with the increasing data collected by the LHC experiments, the challenge to perform an efficient analysis expands. One of the solutions to ease this challenge is the leverage of modern high-performing distributed computing environments, for which RDataFrame provides an easy-to-use interface layer - the distributed RDataFrame.In this paper, we show that the distributed RDataFrame is out of the experimental testing phase, and it is now ready for production thanks to a stabilized user interface. We delve into recent improvements of the distributed RDataFrame, including memory management, C++ code inclusion, and Pythonizations of the interface that allow running the workflows seamlessly. This includes running the distributed RDataFrame on various Analysis Facilities, which is discussed towards the end of the paper.
ISSN:2101-6275
2100-014X
DOI:10.1051/epjconf/202533701007
Puna:Advanced Technologies & Aerospace Database