Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

10.2. Component Gets Poor Quality of Results

While there are many reasons why your design achieves a poor quality of results (QoR), bad memory configurations are often an important factor. Review the Function Memory Viewer report in the High Level Design Reports, and look for stallable arbitration nodes and unexpected RAM utilization.

The information in this section describes some common sources of stallable arbitration nodes or excess RAM utilization.

Component Uses More FPGA Resource Than Expected

By default, the Intel® HLS Compiler Pro Edition tries to optimize your component for the best throughput by trying to maximize the maximum operating frequency (fMAX).

A way to reduce area consumption is to relax the fMAX requirements by setting a target fMAX value with the --clock i++ command option or the hls_scheduler_target_fmax_mhz component attribute. The HLS compiler can often achieve a higher fMAX than you specify, so when you set a target fMAX to a lower value than you need, your design might still achieve an acceptable fMAX value, and a design that consumes less area.

To learn more about the behavior of fMAX target value control see the following tutorial: <quartus_installdir>/hls/examples/tutorials/best_practices/set_component_target_fmax

Loops Do Not Achieve II=1

If you specify a target fMAX , the compiler might conservatively increase II in order to achieve your target fMAX .

If you specify a target fMAX and require II=1, you should use #pragma ii 1 on your loops that require II=1. For more details, refer to Balancing Target fMAX and Target II.

Incorrect Bank Bits

If you access parts of an array in parallel (either a single- or multidimensional array), you might need to configure the memory bank selection bits.

See Memory Architecture Best Practices for details about how to configure efficient memory systems.

Conditional Operator Accessing Two Different Arrays of struct Variables

In some cases, if you try to access different arrays of struct variables with a conditional operator, the Intel® HLS Compiler Pro Edition merges the arrays into the same RAM block. You might see stallable arbitration in the Function Memory Viewer because there are not enough Load/Store site on the memory system.

For example, the following code examples show an array of struct variables, a conditional operator that results in stallable arbitration, and a workaround that avoids stallable arbitration.
struct MyStruct {
  float a;
  float b;
}

MyStruct array1[64];
MyStruct array2[64];
The following conditional operator that uses these arrays of struct variables causes stallable arbitration:
MyStruct value = (shouldChooseArray1) ? array1[idx] : array2[idx];
You can avoid the stallable arbitration that the conditional operator causes here by removing the operator and using an explicit if statement instead.
MyStruct value;
if (shouldChooseArray1)
{
    value = array1[idx];
} else
{
    value = array2[idx];
}

Cluster Logic

Your design might consume more RAM blocks than you expect, especially if you store many array variables in large registers.

You can use the hls_use_stall_enable_clusters component attribute to prevent the compiler from inserting stall-free cluster exit FIFOs.

The Area Analysis of System report in the high-level design report (report.html) can help find this issue.

The three matrices are stored intentionally in RAM blocks, but the RAM blocks for the matrices account for less than half of the RAM blocks consumed by the component.

If you look further down the report, you might see that many RAM blocks are consumed by Cluster logic or State variable. You might also see that some of your array values that you intended to be stored in registers were instead stored in large numbers of RAM blocks.

Notice the number of RAM blocks that are consumed by Cluster Logic and State.

In some cases, you can reduce this RAM block usage by with the following techniques:
  • Pipeline loops instead of unrolling them.
  • Storing local variables in local RAM blocks (hls_memory memory attribute) instead of large registers (hls_register memory attribute).

Component with a System of Tasks Hangs or has Poor Throughput

If your component contains a system of tasks, you might need to add launch/collect capacity.

Incorrectly specifying launch/collect capacity can result in hangs or poor throughput.

For details, refer to Balancing Capacity in a System of Tasks.