Stav dette: From Latent Knowledge Gathering to Side Information Injection in Discrete Sequential Models