Describir: Optimal pivot path of the simplex method for linear programming based on reinforcement learning