Describir: From Latent Knowledge Gathering to Side Information Injection in Discrete Sequential Models