Towards Socially Aware Visual Navigation With Hierarchical Learning

Salvato in:
Dettagli Bibliografici
Pubblicato in:ProQuest Dissertations and Theses (2025)
Autore principale: Johnson, Faith
Pubblicazione:
ProQuest Dissertations & Theses
Soggetti:
Accesso online:Citation/Abstract
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3165174236
003 UK-CbPIL
020 |a 9798304911009 
035 |a 3165174236 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Johnson, Faith 
245 1 |a Towards Socially Aware Visual Navigation With Hierarchical Learning 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Reinforcement learning (RL) has made significant strides in the last few years by proposing increasingly more complex networks that use larger and larger amounts of data to solve a vast host of problems, from playing games to autonomous navigation. Continuing along this trajectory is infeasible for those who do not have access to the large amounts of computing power, data storage, or time required to perpetuate this trend. Additionally, these networks suffer from low sample efficiency and struggle to generalize to out of distribution data. This thesis proposes that leveraging the hierarchical structure inherent in many real world problems, specifically navigation, while efficiently incorporating socially cognizant design into model training and ideation can provide an alternative to this data- and compute-hungry approach.We start with the hypothesis that using networks that mirror the hierarchical structure inherent in many tasks will allow for better overall task performance using simpler networks. We take inspiration from the temporal abstraction of human cognitive processes and compare the performance of several flat neural network architectures and hierarchical paradigms in the maze traversal task. The temporally abstracted actions, also called subroutines, of hierarchical networks happen over multiple time steps and allow agents to reason over complex skills and actions (like leaving a room or going around a corner) instead of low level motor commands. We find that learning a policy over these temporally abstracted actions leads to faster training times, more training stability, and increased accuracy over standard RL or supervised learning with LSTMs.Using this insight, we next explore whether a predefined set of subroutines used by hierarchical networks provides better performance than a learned set. We create a hierarchical framework, comprised of a manager network that passes information to a worker network via a goal vector, for autonomous vehicle steering angle prediction from egocentric videos. The manager network learns an embedding space of subroutines from historical vehicle information. This learned subroutine embedding from the manager allows the worker network to more accurately predict the next steering angle than when using predefined subroutines. Additionally, this hierarchical framework shows improvements over state of the art steering angle prediction methods.In the real world, it is uncommon for the full set of subroutines needed to accomplish a task to be known a priori. Additionally, in order to have a complete autonomous navigation agent, it is imperative that the agent has a model of pedestrian behavior. In the next set of experiments, we aim to address these concerns by building a network to learn a dictionary of pedestrian social behaviors in a self-supervised manner. We use this dictionary to analyze the relationship between pedestrian behavior and the spaces they inhabit as well as the relationships between subroutines themselves. We also use this behavior embedding network in a hierarchical framework to constrain the state space for a worker network, allowing for future pedestrian trajectories to be predicted using a very simple architecture.Finally, we combine our findings into a hierarchical, socially cognizant, visual navigation agent. Instead of formalizing navigation into a traditional reinforcement learning framework, we implicitly learn to mimic optimal human navigation policies from collected demonstrations for the image-goal task in a simulated environment. We build a hierarchical framework with three levels. The first network builds a latent space that acts as a memory module for the navigation agent. The second network predicts waypoints in the current observation space indicating which area of the environment to move towards. The third network predicts which action to execute in the environment with a simple classifier network. The key to this method's success is that each of these networks operates at a different temporal or spatial scale, thus allowing them to bootstrap off of each other to incrementally solve a much larger navigational task and achieve SOTA results without the use of RL, graphs, odometry, metric maps, or other computationally complex and memory intensive methods. 
653 |a Computer engineering 
653 |a Robotics 
653 |a Computer science 
653 |a Electrical engineering 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3165174236/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3165174236/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch