Stav dette: Reinforcement learning-enhanced multi-objective optimization for sustainable coal blending in thermal power plants