Describir: Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving