Article ID: 000080831 Content Type: Troubleshooting Last Reviewed: 08/16/2021

Why are Non-Fatal PCIe* errors logged in Advanced Error Reporting (AER) when using the Intel® FPGA P-Tile/H-Tile , Avalon® Streaming and Avalon® Memory Mapped IP for PCI Express*?

Environment

  • Intel® Quartus® Prime Pro Edition
  • Avalon-MM Intel® Stratix® 10 Hard IP for PCI Express
  • Avalon-ST Intel® Stratix® 10 Hard IP for PCI Express
  • BUILT IN - ARTICLE INTRO SECOND COMPONENT

    Critical Issue

    Description

    The P-Tile/H-Tile Avalon® Streaming Intel® FPGA IP for PCI Express* and the P-Tile/H-Tile Avalon® Memory Mapped Intel® FPGA IP for PCI Express* implements optional Alternative Routing-ID Interpretation (ARI) capability when Multi-function or Single Root I/O Virtualization (SR-IOV) features are enabled. ARI capability includes a field called next function number to help the host BIOS to perform the enumeration process. When ARI is enabled and the number of Physical Functions (PFs) is less than 8 for P-Tile, or 4 for H-tile, the next function number incorrectly shows a value of PF 1.

     

    As a result, the following error status bits in the endpoint may get set if AER is enabled, as the Root Port issues a configuration request to the non-existing PF pointed to by the incorrect next function number:

    • Correctable Error Detected                   (Device Status Register)
    • Unsupported Request Detect               (Device Status Register)
    • Advisory Non-Fatal Error Status           (Correctable Error Status Register)
    • Unsupported Request Error Status      (Uncorrectable Error Status Register)
      • Only set if Advisory Non-Fatal Error Mask bit is set to ‘0’  (Correctable Error Mask Register)

     

    An ERR_COR message will be sent to the Root Port if AER is enabled by setting the following bits below:

    • Advisory Non-Fatal Error Mask is set to '0'                   (Correctable Error Mask Register)
    • Correctable Error Reporting Enable is set to '1'           (Device Control Register)
    • Unsupported Request Reporting Enable is set to '1'  (Device Control Register)

     

    In the Root Port, the following bit will be set if Completion with Unsupported Request status is received

    • Received Master Abort (Secondary Status Register)

     

    Also, in the Root Port, the following bit will be set if ERR_COR is received, and AER is enabled

    • ERR_COR Received (Root Error Status Register)
    Resolution

    For the P-Tile/H-Tile Avalon® Streaming Intel® FPGA IP for PCI Express* and For the P-Tile/H-Tile Avalon® Memory Mapped Intel® FPGA IP for PCI Express*, software can ignore the detected errors each time enumeration is done. If the following error status bits are set in the endpoint after enumeration, then it is safe for the software to ignore them:

    • Correctable Error Detected                (Device Status Register)
    • Unsupported Request Detect            (Device Status Register)
    • Advisory Non-Fatal Error Status        (Correctable Error Status Register)
    • Unsupported Request Error Status   (Uncorrectable Error Status Register)
      • Only if Advisory Non-Fatal Error Mask bit (Correctable Error Mask Register) is set to ‘0’

     

    For simplicity, the workaround can be done in the following order

    1. Upon enumeration complete, clear the error registers below (all bits irrespectively) for all PCIe Endpoint Functions
      1. Device Status Register
      2. Correctable Error Status Register
      3. Uncorrectable Error Status Register
    2. Clear the error registers below (all bits irrespectively) for the PCIe Root Port related to the PCIe Endpoint Functions above
      1. Secondary Status Register
      2. Root Error Status Register
    3. Repeat step 1 and step 2 for each PCI enumeration process.

     

    If runtime polling for errors is being performed, bits 'Correctable Error Detected', 'Unsupported Request Detect', 'Advisory Non-Fatal Error Status' and 'Unsupported Request Error Status' can be checked by the polling software to differentiate this issue from other reliability errors. If only those 4 bits are set, we can assume the errors on the endpoints are related to the P-Tile/H-Tile Avalon® Streaming Intel® FPGA IP for PCI Express* or the P-Tile/H-Tile Avalon® Memory Mapped Intel® FPGA IP for PCI Express* issue and it is appropriate to proceed to clear the error status bits listed in step 1 and step 2 above.

     

    For P-Tile, user logic can use the Configuration Intercept Interface (CII) to correctly advertise the ARI next function number when a Configuration Read is issued by the Root Port.

    Related Products

    This article applies to 1 products

    Intel® Stratix® 10 FPGAs and SoC FPGAs