Understanding the Effects of Increased Transparency on Data Preprocessing Through In-Process Visualizations

Uloženo v:
Podrobná bibliografie
Vydáno v:ProQuest Dissertations and Theses (2025)
Hlavní autor: Su, William
Vydáno:
ProQuest Dissertations & Theses
Témata:
On-line přístup:Citation/Abstract
Full Text - PDF
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3243228331
003 UK-CbPIL
020 |a 9798291555507 
035 |a 3243228331 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Su, William 
245 1 |a Understanding the Effects of Increased Transparency on Data Preprocessing Through In-Process Visualizations 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Most work on evaluating bias in data science workflows tends to focus on the model. However, the training data fed into the model and the data preprocessing step that produces it can also have significant impact on model results. While there has been work on editing the data in data preprocessing to mitigate bias, the impact of conventional data preprocessing operations has been understudied. My dissertation delves into how the data preprocessing step can be improved to help analysts better understand the impact of the step and lead to smarter data science decisions. I first study the needs of data scientists when conducting data preprocessing through a small-scale interview study and compared the results with a literature survey of current preprocessing tools. The comparison analysis identified several key gaps between practice and theory. I utilized of result of the analysis to develop the Preprocess Analyzer (PPA) tool, which is designed to address some of the gaps by being integrated into existing data science work environments and provided users with a deeper insight into their data. I conducted a user study to evaluate the ability of PPA to aid with data preprocessing. The study results found that compared to existing popular tools, data scientists gained a better understanding of their data preprocessing workflow when utilizing PPA. Participants generally agreed that PPA included many helpful features such as the ability to quickly display useful statistics, highlight areas of concern, and integration into familiar work environments. I believe the results of this dissertation can guide the design of future data preprocessing tools to better meet the needs of the end user. 
653 |a Information science 
653 |a Computer science 
653 |a Library science 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3243228331/abstract/embedded/Y2VX53961LHR7RE6?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3243228331/fulltextPDF/embedded/Y2VX53961LHR7RE6?source=fedsrch