Describir: A NeRF-Based Captioning Framework for Spatially Rich and Context-Aware Image Descriptions