Skip to main content

PWR036: Avoid indirect array access to improve performance

Issue

Indirect array access may impact performance.

Actions

Consider using techniques like loop fusion, loop interchange, loop tiling or changing the data layout to avoid non-consecutive accesses in hot loops.

Relevance

Accessing an array indirectly (e.g. through another array containing the positions to be accessed) is generally less efficient than accessing consecutive positions because the latter improves locality of reference.

Code example

C

Consider the example code below to illustrate the presence of indirect access patterns. The elements of array a are accessed in an indirect manner through the array b. Thus, the code exhibits random accesses that cannot be predicted before the actual execution of the code:

void example(float *a, unsigned *b, unsigned size) {
for (unsigned i = 0; i < size; ++i) {
a[b[i]] = 0.0f;
}
}

Next, consider another example code where memory access patterns can be optimized to improve locality of reference. The elements of the array a are accessed indirectly through the array index. Consequently, the program accesses random elements of the array a, which leads to a low performance due to a poor usage of the memory subsystem:

for (int i = 0; i < LEN_1D; ++i) {
for (int j = 1; j < LEN_1D; j++) {
c[i] += a[index[j]];
}
}

The alternative implementation shown below takes advantage of loop interchange to improve locality of reference. Now, the loop over j becomes the outer loop, and the loop over i becomes the inner loop. As a result, the access to a[index[j]] is repeated across the iterations of the inner loop since the value of j doesn't change, resulting in accesses to a constant memory location. This leads to a better usage of the memory subsystem, and thus, to a performance improvement:

for (int j = 1; j < LEN_1D; j++) {
for (int i = 0; i < LEN_1D; ++i) {
c[i] += a[index[j]];
}
}

Fortran

Consider the example code below to illustrate the presence of indirect access patterns. The elements of array a are accessed in an indirect manner through the array b. Thus, the code exhibits random accesses that cannot be predicted before the actual execution of the code:

subroutine example()
implicit none
integer, intent(out) :: a
integer, intent(in) :: b
integer :: i

do i = 1, size(a, 1)
a(b(i)) = 0
end do
end subroutine example

Next, consider another example code where memory access patterns can be optimized to improve locality of reference. The elements of the array a are accessed indirectly through the array index. Consequently, the program accesses random elements of the array a, which leads to a low performance due to a poor usage of the memory subsystem:

do i = 1, size(c, 1)
do j = 2, size(index, 1)
c(i) = c(i) + a(index(j))
end do
end do

The alternative implementation shown below takes advantage of loop interchange to improve locality of reference. Now, the loop over j becomes the outer loop, and the loop over i becomes the inner loop. As a result, the access to a(index(j)) is repeated across the iterations of the inner loop since the value of j doesn't change, resulting in accesses to a constant memory location. This leads to a better usage of the memory subsystem, and thus, to a performance improvement:

do j = 2, size(index, 1)
do i = 1, size(c, 1)
c(i) = c(i) + a(index(j))
end do
end do

References