Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 4/01/2024
Public
Document Table of Contents

4.1.1. Pointer Interfaces

Software developers accustomed to writing code that targets a CPU might first try to code this algorithm by declaring vectors a, b, and c as pointers to get the data in and out of the component.
Using pointers in this way results in a single Avalon Memory-Mapped (MM) Host interface that the three input variables share.

Pointers in a component are implemented as Avalon® Memory Mapped ( Avalon® -MM) host interfaces with default settings. For more details about pointer parameter interfaces, see Intel HLS Compiler Default Interfaces in Intel® High Level Synthesis Compiler Pro Edition Reference Manual.

The vector addition component example with pointer interfaces can be coded as follows:
component void vector_add(int* a,
                          int* b,
                          int* c,
                          int N) {
  #pragma unroll 8
  for (int i = 0; i < N; ++i) {
    c[i] = a[i] + b[i];
  }
}
The following diagram shows the Function View in the Graph Viewer that is generated when you compile this example. Because the loop is unrolled by a factor of 8, the diagram shows that vector_add.B2 has 8 loads for vector a, 8 loads for vector b, and 8 stores for vector c. In addition, all of the loads and stores are arbitrated on the same memory, resulting in inefficient memory accesses.
Figure 24. Graph Viewer Function View for vector_add Component with Pointer Interfaces


The following Loop Analysis report shows that the component has an undesirably high loop initiation interval (II). The II is high because vectors a, b, and c are all accessed through the same Avalon-MM Host interface. The Intel® HLS Compiler Pro Edition uses stallable arbitration logic to schedule these accesses, which results in poor performance and high FPGA area use.

In addition, the compiler cannot assume there are no data dependencies between loop iterations because pointer aliasing might exist. The compiler cannot determine that vectors a, b, and c do not overlap. If data dependencies exist, the Intel® HLS Compiler cannot pipeline the loop iterations effectively.



Compiling the component with an Quartus® Prime compilation flow targeting an Arria® 10 device results in the following QoR metrics, including high ALM usage, high latency, high II, and low fMAX. All of which are undesirable properties in a component.
Table 2.  QoR Metrics for a Component with a Pointer Interface1
QoR Metric Value
ALMs 15593.5
DSPs 0
RAMs 30
fMAX (MHz)2 298.6
Latency (cycles) 24071
Initiation Interval (II) (cycles) ~508
1The compilation flow used to calculate the QoR metrics used Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.