Intel® Data Center Diagnostic Tool for Intel® Xeon® Processors

Documentation

Maintenance & Performance

000058107

09/15/2021

Introduction

The Intel® Data Center Diagnostic Tool is a diagnostic software tool that can be run on your data center platforms to:

  • Verify the functionality of all cores within an Intel® Xeon® Processor.
  • Be used as part of a regular system maintenance program.

High reliability and availability in the data center require the right tools and a commitment to maintenance. Intel believes it is an industry best practice to use maintenance tools such as these for both initial deployment and periodic testing to help ensure the best system experience.

System requirements

The Intel Data Center Diagnostic Tool is a Linux* application that can be installed and run on many current Linux distributions. There is no Windows* version of this tool.

For best coverage, run the application in the root system of a server. It is possible to run it inside a container or virtual machine, but be aware that some functionality may be disabled.

Supported processors:

  • 3rd Generation Intel® Xeon® Scalable Processors (formerly Ice Lake and Cooper Lake)
  • 2nd Generation Intel® Xeon® Scalable Processors (formerly Cascade Lake)
  • 1st Generation Intel® Xeon® Scalable Processors (formerly Skylake)
  • Intel® Xeon® Processor E5 v4 Family (formerly Broadwell)
  • Intel® Xeon® Processor E7 v4 Family (formerly Broadwell)

Installation

Notes
  • Additional details are available in the /usr/share/doc/dcdiag/README.rst file included in the installation.
  • We recommend using the steps in the sections below to link to the repository, which ensures that you get the latest version of the Intel® Data Center Diagnostic Tool. However, if you require a downloadable binary, use an RPM file or DEB file.

 

Debian*/Ubuntu*

To install the Intel® Data Center Diagnostic Tool software packages on Debian*-based distributions, add the Intel software package repository and install the appropriate packages.

Prior to copying+pasting to your console, you may want to run sudo ls and enter your password to prevent the commands from being consumed by the sudo password prompt:

Set up the key to verify the package signatures

curl https://repositories.intel.com/dcdt/dcdiag.pub | sudo apt-key add -

Set up the repository

sudo apt-add-repository 'deb https://repositories.intel.com/dcdt/debian stable main'

Install the package

sudo apt-get update
sudo apt-get install dcdiag

Fedora*/CentOS*/RHEL*

To install the Intel Data Center Diagnostic Tool software packages on a Fedora-based distribution, add the Intel software package repository and install the package.

The first time you install, YUM or DNF will prompt you to accept the signing key. Verify that the fingerprint is as follows, and then accept it:
Userid: "CN=Release Key"
Fingerprint: 6226 CA48 AAB6 0900 2093 C7C4 0A04 4B42 CF00 5B79

Prior to copying+pasting to your console, you may want to run sudo ls and enter your password to prevent the commands from being consumed by the sudo password prompt:

Install the repository file

sudo yum install https://repositories.intel.com/dcdt/dcdiag-repo.rpm

Install the package

sudo yum install dcdiag

OpenSUSE*/SUSE Linux Enterprise*:

Install the repository file

sudo zypper ar https://repositories.intel.com/dcdt/dcdiag.repo

Install the package

sudo zypper install dcdiag

You will be warned that respond.xml is not signed. Respond yes to continue. You will be given another chance to verify the package signature. Verify that the fingerprint is as follows, and then accept it:

Repository: dcdiag
Key Name: CN=Release Key
Key Fingerprint: 6226CA48 AAB60900 2093C7C4 0A044B42 CF005B79
Key Created: Tue 24 Nov 2020 01:47:38 PM PST
Key Expires: Sat 25 Nov 2023 01:47:38 PM PST
Rpm Name: gpg-pubkey-cf005b79-5fbd7f7a

 

How to test the Intel Xeon Processor

Once installed, the Intel Data Center Diagnostic Tool is automatically enabled for background execution. You can verify that this is successful with the following command:

# systemctl status dcdiag
● dcdiag.service - Intel® Data Center Diagnostic Tool
Loaded: loaded (/usr/lib/systemd/system/dcdiag.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-02-19 11:24:17 MST; 4 days ago
Docs: file:///usr/share/doc/dcdiag/README.rst
Main PID: 8777 (dcdiag)
CGroup: /system.slice/dcdiag.service
└─8777 /usr/bin/dcdiag --service

If any errors are detected, the tool will log them to the system log. The tool can also query if any errors were detected in the background scan using the --query argument.

# dcdiag --query
Intel® Data Center Diagnostic Tool Version 506
Test completed successfully. No issues detected.

This tool can also be run manually in the foreground by executing at a Linux command prompt:

# dcdiag

The manual test runs for about 45 minutes and has high CPU utilization.

When the diagnostic completes, the system returns one of the following messages:

  • Test completed successfully. No issues detected.
     
  • Test completed successfully. One or more machine check errors occurred. Please check the system logs.
     
  • This processor is not supported by this version of the tool.

    Check the system's processor model and version. This message appears if the Intel Data Center Diagnostic Tool does not detect a production version of the supported processors. Engineering samples are not supported by this tool.

    Find help in identifying the processor.
     
  • Test completed. Results are inconclusive due to an outdated version of microcode.

    The latest version of the microcode addresses known issues. Please update. Microcode updates are usually delivered by your Linux distribution vendor alongside security fixes and other firmware updates for various components. If your system does not have these updates enabled, we recommend that you enable them. The microcode is automatically loaded by the Linux kernel on every boot and can be reloaded at runtime with the following command as root:

    echo 1 > /sys/devices/system/cpu/microcode
     
  • Test completed. Results are inconclusive due to the system exceeding temperature limits

    This could be due to a variety of issues with the system that is not providing enough cooling for the CPU to operate within required temperature limits. We recommend that you check your system to ensure that required cooling is operating correctly. This may include faulty fans, incorrect airflow, or some other environmental issue.
     
  • Test completed. Results are inconclusive, one or more machine check errors occurred.

    Check system logs.
     
  • Test failed. Contact your system manufacturer or processor vendor for support.

    If test results show fail, check if your server node's processors are still under warranty:

    • If you have a Boxed Intel® Xeon® Processor still under 3-year warranty, contact Intel Customer Support for assistance.
    • If you have a tray processor, contact your system or processor vendor or place of purchase to check if the processor is still under warranty.
      Note Tray processors are sold directly to system manufacturers or Intel authorized distributors. Intel does not provide direct warranty to end users for tray processors unless they came preinstalled in Intel® Data Center Blocks (Intel® DCB) server systems. Except for Intel DCB systems, the tray processor’s warranty is from the vendor or place of purchase of the processor or the system if the processor was pre-installed. Intel recommends purchasing from Intel Authorized Distributors, Intel Approved Suppliers, and resellers of Intel® products.
    • Be aware that Intel does not have an out-of-warranty replacement program.
       
  • Test failed.

    Test completed, and an error was detected on the physical processor containing /sys/devices/system/cpu/cpuXX.

    Contact your system manufacturer or processor vendor for support.

  • Test failed.

    Test is unable to determine which physical processor caused the failure.

    Contact your system manufacturer or processor vendor for support.
     

Version history

Date Version Description
July 7, 2021 540 Initial version