Improving Compilation Flows for RISC-V Machine Learning Custom Instructions

Guardado en:

Detalles Bibliográficos
Publicado en:	PQDT - Global (2025)
Autor principal:	Sequeira, Guilherme Soares
Publicado:	ProQuest Dissertations & Theses
Materias:	Machine learning Digital libraries Search engines Embedded systems Learning curves Artificial intelligence English language Power Neural networks Benchmarks Smart houses Software upgrading Preprints Workloads Keywords Web studies Computer engineering
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC


LEADER	00000nab a2200000uu 4500
001	3288213201
003	UK-CbPIL
020			\|a 9798265493842
035			\|a 3288213201
045	2		\|b d20250101 \|b d20251231
084			\|a 189128 \|2 nlm
100	1		\|a Sequeira, Guilherme Soares
245	1		\|a Improving Compilation Flows for RISC-V Machine Learning Custom Instructions
260			\|b ProQuest Dissertations & Theses \|c 2025
513			\|a Dissertation/Thesis
520	3		\|a The demand for low-cost, low-power edge devices capable of performing Artificial Intelligence (AI) workloads has been increasing in the last few years. Interest in pairing RISC-V, an open standard, royalty-free ISA built from the ground-up with customizability in mind, with specialized hardware, capable of performing the tasks they are designed for with exceptional efficiency, naturally begins to emerge, spawning multiple RISC-V based IPs. However, few seem interested in developing the compilers alongside their hardware, either due to requiring too big of an investment, steep learning curve, or other factors.This thesis proposes an alternative: the introduction of a source-to-source compilation step right before compilation, allowing the automatic insertion of custom instructions directly into the source code using in-line assembly using a much more accessible API and ecosystem.We discuss the details of automatically accelerating vector-vector dot products with the use of a MAC custom instruction as well as the necessary static analysis along the way. At the end of the day, we are able to find acceleration opportunities in third-party benchmarks. When running our program in an FPGA programmed with a closed-source IP we achieve a speedup of up to 7.1 times compared to the original, unoptimized program and matching the performance of manually optimized code.
653			\|a Machine learning
653			\|a Digital libraries
653			\|a Search engines
653			\|a Embedded systems
653			\|a Learning curves
653			\|a Artificial intelligence
653			\|a English language
653			\|a Power
653			\|a Neural networks
653			\|a Benchmarks
653			\|a Smart houses
653			\|a Software upgrading
653			\|a Preprints
653			\|a Workloads
653			\|a Keywords
653			\|a Web studies
653			\|a Computer engineering
773	0		\|t PQDT - Global \|g (2025)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3288213201/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3288213201/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch