Improving Compilation Flows for RISC-V Machine Learning Custom Instructions

Guardado en:
Detalles Bibliográficos
Publicado en:PQDT - Global (2025)
Autor principal: Sequeira, Guilherme Soares
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3288213201
003 UK-CbPIL
020 |a 9798265493842 
035 |a 3288213201 
045 2 |b d20250101  |b d20251231 
084 |a 189128  |2 nlm 
100 1 |a Sequeira, Guilherme Soares 
245 1 |a Improving Compilation Flows for RISC-V Machine Learning Custom Instructions 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The demand for low-cost, low-power edge devices capable of performing Artificial Intelligence (AI) workloads has been increasing in the last few years. Interest in pairing RISC-V, an open standard, royalty-free ISA built from the ground-up with customizability in mind, with specialized hardware, capable of performing the tasks they are designed for with exceptional efficiency, naturally begins to emerge, spawning multiple RISC-V based IPs. However, few seem interested in developing the compilers alongside their hardware, either due to requiring too big of an investment, steep learning curve, or other factors.This thesis proposes an alternative: the introduction of a source-to-source compilation step right before compilation, allowing the automatic insertion of custom instructions directly into the source code using in-line assembly using a much more accessible API and ecosystem.We discuss the details of automatically accelerating vector-vector dot products with the use of a MAC custom instruction as well as the necessary static analysis along the way. At the end of the day, we are able to find acceleration opportunities in third-party benchmarks. When running our program in an FPGA programmed with a closed-source IP we achieve a speedup of up to 7.1 times compared to the original, unoptimized program and matching the performance of manually optimized code. 
653 |a Machine learning 
653 |a Digital libraries 
653 |a Search engines 
653 |a Embedded systems 
653 |a Learning curves 
653 |a Artificial intelligence 
653 |a English language 
653 |a Power 
653 |a Neural networks 
653 |a Benchmarks 
653 |a Smart houses 
653 |a Software upgrading 
653 |a Preprints 
653 |a Workloads 
653 |a Keywords 
653 |a Web studies 
653 |a Computer engineering 
773 0 |t PQDT - Global  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3288213201/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3288213201/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch