Describir: Infrared-Visible Image Fusion Meets Object Detection: Towards Unified Optimization for Multimodal Perception