Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

ID 683259
Date 12/18/2019
Public
Document Table of Contents

3.1.3. Avalon® Memory Mapped Slave Interfaces

Depending on your component, you can sometimes optimize the memory structure of your component by using Avalon® Memory Mapped (MM) slave interfaces.

When you allocate a slave memory, you must define its size. Defining the size puts a limit on how large a value of N that the component can process. In this example, the RAM size is 1024 words. This RAM size means that N can have a maximal size of 1024 words.

The vector addition component example can be coded with an Avalon® MM slave interface as follows:
component void vector_add(
     hls_avalon_slave_memory_argument(1024*sizeof(int)) int* a,
     hls_avalon_slave_memory_argument(1024*sizeof(int)) int* b,
     hls_avalon_slave_memory_argument(1024*sizeof(int)) int* c,
     int N) {
  #pragma unroll 8
  for (int i = 0; i < N; ++i) {
    c[i] = a[i] + b[i];
  }
}
The following diagram shows the Component Viewer report generated when you compile this example.
Figure 3. Component View of vector_add Component with Avalon® MM Slave Interface


Compiling this component with an Intel® Quartus® Prime compilation flow targeting an Intel® Arria® 10 device results in the following QoR metrics:
Table 4.  QoR Metrics Comparison for Avalon® MM Slave Interface1
QoR Metric Pointer Avalon® MM Master Avalon® MM Slave
ALMs 15593.5 643 490.5
DSPs 0 0 0
RAMs 30 0 48
fMAX (MHz)2 298.6 472.37 498.26
Latency (cycles) 24071 142 139
Initiation Interval (II) (cycles) ~508 1 1
1The compilation flow used to calculate the QoR metrics used Intel® Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.
The QoR metrics show by changing the ownership of the memory from the system to the component, the number of ALMs used by the component are reduced, as is the component latency. The fMAX of the component is increased as well. The number of RAM blocks used by the component is greater because the memory is implemented in the component and not the system. The total system RAM usage (not shown) should not increase because RAM usage shifted from the system to the FPGA RAM blocks.