Understanding the Effects of Increased Transparency on Data Preprocessing Through In-Process Visualizations

-д хадгалсан:
Номзүйн дэлгэрэнгүй
-д хэвлэсэн:ProQuest Dissertations and Theses (2025)
Үндсэн зохиолч: Su, William
Хэвлэсэн:
ProQuest Dissertations & Theses
Нөхцлүүд:
Онлайн хандалт:Citation/Abstract
Full Text - PDF
Шошгууд: Шошго нэмэх
Шошго байхгүй, Энэхүү баримтыг шошголох эхний хүн болох!
Тодорхойлолт
Хураангуй:Most work on evaluating bias in data science workflows tends to focus on the model. However, the training data fed into the model and the data preprocessing step that produces it can also have significant impact on model results. While there has been work on editing the data in data preprocessing to mitigate bias, the impact of conventional data preprocessing operations has been understudied. My dissertation delves into how the data preprocessing step can be improved to help analysts better understand the impact of the step and lead to smarter data science decisions. I first study the needs of data scientists when conducting data preprocessing through a small-scale interview study and compared the results with a literature survey of current preprocessing tools. The comparison analysis identified several key gaps between practice and theory. I utilized of result of the analysis to develop the Preprocess Analyzer (PPA) tool, which is designed to address some of the gaps by being integrated into existing data science work environments and provided users with a deeper insight into their data. I conducted a user study to evaluate the ability of PPA to aid with data preprocessing. The study results found that compared to existing popular tools, data scientists gained a better understanding of their data preprocessing workflow when utilizing PPA. Participants generally agreed that PPA included many helpful features such as the ability to quickly display useful statistics, highlight areas of concern, and integration into familiar work environments. I believe the results of this dissertation can guide the design of future data preprocessing tools to better meet the needs of the end user.
ISBN:9798291555507
Эх сурвалж:ProQuest Dissertations & Theses Global