Embedded Design Handbook

ID 683689
Date 8/28/2023
Public
Document Table of Contents

3.4.4.5. Arithmetic Byte Reordering

Altering your system to conform to Avalon® -MM byte ordering modifies the internal arithmetic byte ordering of multibyte values as seen by the software. For example, an Avalon® -MM compliant big endian processor core such as an ARM BE-8 processor accesses memory using the bus mapping shown below.

Table 5.  ARM BE-8 Write Data Mapping
Access Size (Bits) Offset (Bytes) Value Byte Enable (Bits 3:0) Write Data (Bits 31:24) Write Data (Bits 23:16) Write Data (Bits 15:8) Write Data (Bits 7:0)
8 0 0x0A 0001 0x0A
8 1 0x0A 0010 0x0A
8 2 0x0A 0100 0x0A
8 3 0x0A 1000 0x0A
16 0 0x0A0B 0011 0x0B 0x0A
16 2 0x0A0B 1100 0x0B 0x0A
32 0 0x0A0B0C0D 1111 0x0D 0x0C 0x0B 0x0A

The big endian ARM BE-8 mapping in the table above matches the little endian Nios® II processor mapping for all single byte accesses. If you ensure that your processor is Avalon® -MM compliant, you can easily share individual bytes of data between big and little endian processors and peripherals.

However, making sure that the processor data master is Avalon® -MM compliant only ensures that single byte accesses map to the same physical byte lanes of a slave port. In the case of multibyte accesses, the same byte lanes are accessed between the BE-8 and little endian processor; however, the value is not interpreted consistently. This mismatch is only important when the internal arithmetic byte ordering of the processor differs from other peripherals and processors in your system.

To correct the mismatch, you must perform arithmetic byte reordering in software for multibyte accesses. Interpretation of the data by the processor can vary based on the arithmetic byte ordering used by the processor and other processors and peripherals in the system.

For example, consider a 32-bit ARM BE-8 processor core that reads from a 16-bit little endian timer peripheral by performing a 16-bit read access. The ARM processor treats byte offset 0 as the most significant byte of any word. The timer treats byte offset 0 as the least significant byte of the 16-bit value. When the processor reads a value from the timer, the bytes of the value, as seen by software, are swapped. The figure below shows the swapping. A timer counter value of 0x0800 (2,048 clock ticks) is interpreted by the processor as 0x0008 (8 clock ticks) because the arithmetic byte ordering of the processor does not match the arithmetic byte ordering of the timer component.

Figure 20. ARM BE-8 Processor Accessing a Little Endian Peripheral

For the values to be interpreted accurately, the processor must either read each byte lane individually and then combine the two byte reads into a single 16-bit value in software, or read the single 16-bit value and swap the bytes in software.

The same issue occurs when you apply a bus-level renaming wrapper to an ARM BE-32 or PowerPC core. Both processor cores treat byte offset 0 as the most significant byte of any value. As a result, you must handle any mismatch between arithmetic byte ordering of data used by the processor and peripherals in your system.

On the other hand, if the timer in the figure above were to treat the most significant byte of the 16-bit value as byte 0 (big endian ordering), the data would arrive at the processor master in the same arithmetic byte ordering used by the processor. If the processor and the component internally implement the same arithmetic byte ordering, no software swapping of bytes is necessary for multibyte accesses.

The figure below shows how the value 0x0800 of a big endian timer is read by the processor. The value is retained without the need to perform any byte swapping in software after the read completes.

Figure 21. ARM BE-8 Processor Accessing a BE-8 Peripheral