AN 903: Accelerating Timing Closure: in Intel® Quartus® Prime Pro Edition

ID 683664
Date 2/25/2021
Public

1.1.3. Reduce High Fan-Out Nets

High fan-out nets can cause resource congestion, thereby complicating timing closure. In general, the Compiler automatically manages high fan-out nets related to clocks. The Compiler automatically promotes recognized high fan-out nets to the global clock network. The Compiler makes a higher optimization effort during the Place and Route stages, which results in beneficial register duplication.

In the following corner cases, you can additionally reduce congestion by making the following manual changes to your design RTL:

Table 2.  High Fan-Out Net Corner Cases
Design Characteristic Manual RTL Optimization
High fan-out nets that reach many hierarchies or physically far destinations Specify the duplicate_hierarchy_depth assignment on the last register in a pipeline to manually duplicate high fan-out networks across hierarchies. Specify the duplicate_register assignment to duplicate registers during placement.
Designs with control signals to DSP or M20K memory blocks from combinational logic Drive the control signal to the DSP or M20K memory from a register.

Register Duplication Across Hierarchies

You can specify the duplicate_hierarchy_depth assignment on the last register in a pipeline to guide the creation of register duplication and fan-outs. The following figures illustrate the impact of the following duplicate_hierarchy_depth assignment:

set_instance_assignment -name duplicate_hierarchy_depth -to \
     <register_name> <level_number>
Where:
  • register_name—the last register in a chain that fans out to multiple hierarchies.
  • level_number—the number of registers in the chain to duplicate.
Figure 8. Before Register DuplicationSet the duplicate_hierarchy_depth assignment to implement register duplication across hierarchies, and create a tree of registers following the last register in the chain. You specify the register name and the number of duplicates represented by M in the following example. Red arrows show the potential locations of duplicate registers.
set_instance_assignment –name DUPLICATE_HIERARCHY_DEPTH –to regZ M
Figure 9. Register Duplication = 1Specifying the following single level of register duplication (M=1) duplicates one register (regZ) down one level of the design hierarchy:
set_instance_assignment –name DUPLICATE_HIERARCHY_DEPTH –to regZ 1
Figure 10. Register Duplication = 3Specifying three levels of register duplication (M=3) duplicates three registers (regZ, regY, regX) down three, two, and one level of the hierarchy, respectively:
set_instance_assignment –name DUPLICATE_HIERARCHY_DEPTH –to regZ 3

By duplicating and pushing the registers down into the hierarchies, the design retains the same number of cycles to all the destinations, while greatly accelerating performance on these paths.

Register Duplication During Placement

Figure 11 shows a register with high fan-out to a widely spread area of the chip. By duplicating this register 50 times, you can reduce the distance between the register and the destinations that ultimately result in faster clock performance. Assigning duplicate_register allows the Compiler to leverage physical proximity to guide the placement of new registers feeding a subset of fan-outs.

Figure 11. Register Duplication During Placement
Note: To broadcast a signal across the chip, use a multistage pipeline. Apply the duplicate_register assignment to each of the registers in the pipeline. This technique creates a tree structure that broadcasts the signal across the chip.

Viewing Duplication Results

Following design synthesis, view duplication results in the Hierarchical Tree Duplication Summary report in the Synthesis folder of the Compilation Report. The report provides the following:

  • Information on the registers that have the duplicate_hierarchy_depth assignment.
  • Reason for the chain length that you can use as a starting point for further improvements with the assignment.
  • Information about the individual registers in the chain that you can use to better understand the structure of the implemented duplicates.

The Fitter report also includes a section on registers that have the duplicate_register setting.