Describir: An auditory-visual cooperative perception method for honking vehicle localization