Novel Ways to Bind Software to Hardware Using AI
By Prof. PE Gaillardon, CTO at Rapid Silicon
Artificial intelligence and FPGAs have a synergistic relationship. The ability to customize accelerators in hardware, coupled with the abundance of DSP and memory resources in a flexible, configurable and fracturable array, lend FPGAs to be a graceful solution to highly complex arithmetic problems.
Unfortunately, FPGA adoption in AI has been slower than expected. We believe the problem is twofold. First, most AI workload support is focused on fitting the AI algorithm to the FPGA. Mapping efforts focus on tweaking the AI algorithm to provide a best fit for the available FPGA architectures. Second, the approach to mapping the AI algorithm to an FPGA is a hardware-first exercise. However, AI is a prime example of the importance of software-hardware co-design. A graceful, easy-to-use software stack is much more valuable than optimized hardware mapping. This is especially the case for AI, where high-level frameworks are used for designing applications, and as such, the compilers, hardware templates and synthesizers must be in sync to ensure optimal data movement, avoid starvations, and achieve performance for real-time data processing.
At Rapid Silicon, these aren’t problems, they’re opportunities to rethink FPGA fabric architectures and to build software-first user flows. We’re looking at AI algorithms to see what type of structures are most efficient, starting with AI-optimized DSPs, properly sized BRAMs to accommodate weights, in-memory computing elements, and fabric routing to support systolic array soft IPs, to name a few. We are researching and creating novel ways to bind software to hardware, bypassing the legacy of HLS and RTL, while exploiting high-level abstractions like templates and SYCL, giving users a simplified flow from Python to gates.
We’re extending this opportunity to the open-source community. Similar to the EPFL benchmarks for synthesis tool evaluation, we need a singular framework to evaluate the performance of hardware AI workloads on FPGAs, thus allowing all vendors to be evaluated fairly. Generally accepted benchmark applications and reference models (pre-trained to achieve proper accuracy) would be a great start. This is an ideal corollary to the EPFL competition for synthesis tool evaluation.