Describir: Gesture recognition method integrating multimodal inter-frame motion and shared attention weights