Describir: High-Performance Matrix Multiplication Using Block Parallelization on CPU and GPU