The Many Hats of Pixels: Supporting Human Interaction and Hierarchical Understanding in Segmentation

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Myers-Dean, Josh
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Image segmentation, the task of delineating meaningful regions in visual data, has persisted as a central problem in computer vision. While recent advances using deep learning and transformer-based architectures have improved segmentation accuracy, current systems remain limited in their ability to adapt to diverse user interactions and represent the hierarchical, context-dependent nature of scenes. In practice, a single pixel may belong to an object, part, or subpart depending on task or user intent; yet most segmentation models operate at a fixed level of abstraction and rely on rigid input modalities.This dissertation introduces segmentation methods that are hierarchical, interaction-aware, and user-centric. Motivated by practical research experiences in data annotation, human-computer interaction (HCI), and creative tools, the work addresses two key threads: (1) enabling flexible, multimodal user interaction in hybrid human-machine partnerships, and (2) modeling hierarchical relationships in natural images. The first thread (i.e., supporting human-machine partnerships) of this dissertation presents a new dataset and model supporting varied input types (e.g., clicks, scribbles, shapes), enabling more intuitive interactions without requiring explicit user annotations. Additionally, a weakly-supervised fine-tuning framework for interactive segmentation is presented in this dissertation and improves segmentation consistency across user inputs, reducing cognitive load in creative workflows. The second thread (i.e., modeling hierarchical relationships) introduces the first hierarchical semantic segmentation dataset with annotations at object, part, and subpart levels. Building on this, this dissertation proposes the first model that leverages specialized tokens within a large language model to capture “is-part-of” relationships in a single inference pass. Together, these contributions aim to reframe segmentation as a collaborative, context-aware process that better aligns with human perception and real-world needs.
ISBN:9798291574058
Fuente:ProQuest Dissertations & Theses Global