PWR034: avoid strided array access to improve performance

Issue

Strided array access may impact performance.

Actions

Consider using techniques like loop fusion, loop interchange, loop tiling or changing the data layout to avoid non-sequential access in hot loops.

Relevance

Accessing an array using a non-unit stride is less efficient than accessing consecutive positions because the latter improves the locality of reference. Note that C/C++ column-wise matrix access is an example non-unit stride, where the stride is the column width.

Code example

C

The following code shows a loop with a strided access to array a with stride 2. Avoiding it would require changing the data layout of the program:

void example(float *a, unsigned size) {
  for (unsigned i = 0; i < size; i += 2) {
    a[i] = 0.0f;
  }
}

Another code with strided accesses is show below. In this case, both variables a and b have a stride LEN:

for (int i = 0; i < LEN; ++i) {
  for (int j = 1; j < LEN; j++) {
    a[j * LEN + i] = a[j * LEN + i - 1] + b[j * LEN + i];
  }
}

Note that by using loop interchange, the loop order changes from ij to ji. The resulting code shown below has sequential accesses (i.e., stride 1) for variables ij and b in the scope of the innermost loop. As a result, a simple code change solves the issue in this scenario, without requiring disruptive changes in data layout:

for (int j = 1; j < LEN; ++j) {
  for (int i = 0; i < LEN; i++) {
    a[j * LEN + i] = a[j * LEN + i - 1] + b[j * LEN + i];
  }
}

Fortran

The following code shows a loop with a strided access to array a with stride 2. Avoiding it would require changing the data layout of the program:

subroutine example(a)
  real, intent(out) :: a(:)
  integer :: i

  do i = 1, size(a, 1), 2
    a(i) = 0.0
  end do
end subroutine example

Another code with strided accesses is show below. In this case, both variables a and b have dimensions (LEN, LEN), and thus are implicitly accessed with a stride of LEN elements as Fortran uses column-major order:

do i = 1, size(a, 1)
  do j = 2, size(a, 2)
    a(i, j) = a(i, j - 1) + b(i, j)
  end do
end do

Note that by using loop interchange, the loop order changes from ij to ji. The resulting code shown below has sequential accesses (i.e., stride 1) in the scope of the innermost loop. As a result, a simple code change solves the issue in this scenario, without requiring disruptive changes in data layout:

do j = 2, size(a, 2)
  do i = 1, size(a, 1)
    a(i, j) = a(i, j - 1) + b(i, j)
  end do
end do

PWR034 examples

References

Loop interchange
Memory access pattern (strided access pattern)

Issue​

Actions​

Relevance​

Code example​

C​

Fortran​

Related resources​

References​