Describir: In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation