Skip to main content

PWD003: Missing array range in data copy to the GPU

Issue

Copying data to the GPU from an array whose size is not known to the compiler requires specifying information about the desired array data range to be copied.

Actions

Specify the array range to be copied to device memory.

Relevance

When offloading to the GPU, copying array data from the host memory to the GPU memory requires information about the data range of the array that must be copied. If the compiler knows the array size (e.g., for C static arrays or Fortran assumed shape arrays) and the whole array must be copied, specifying the data range is optional for both OpenMP and OpenACC standards. However, in the case of arrays whose size is not known to the compiler, specifying the array range is compulsory. Some compilers do not enforce this, which leads to undefined behavior. For instance, for C dynamic arrays the pointer scalar value might be copied instead of any pointed-to data; for Fortran assumed size arrays, an invalid memory access might occur or erroneous memory (i.e., from wrong memory locations) might be copied.

Code example

C

In the following OpenMP code, a pointer is being copied to the offloading target device instead of the dynamic array data pointed by it.

void foo(int *a, int *b, int *sum, int size) {
#pragma omp target map(to: a, b) map(from: sum)
#pragma omp parallel for
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}

In this case, it suffices to specify the array bounds in the OpenMP map clauses:

void foo(int *a, int *b, int *sum, int size) {
#pragma omp target map(to: a[0:size], b[0:size]) map(from: sum[0:size])
#pragma omp parallel for
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}

The same applies to the analogous OpenACC example.

void foo(int *a, int *b, int *sum, int size) {
#pragma acc data copyin(a, b) copyout(sum)
#pragma acc parallel loop
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}

And again, specifying the array bounds fixes the problem:

void foo(int *a, int *b, int *sum, int size) {
#pragma acc data copyin(a[0:size], b[0:size]) copyout(sum[0:size])
#pragma acc parallel loop
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}

Fortran

The following OpenMP code copies multiple assumed-size arrays to the offloading target:

subroutine foo(a, b, sum, size)
implicit none
integer, dimension(*), intent(in) :: a, b
integer, dimension(*), intent(out) :: sum
integer, intent(in) :: size
integer :: i

!$omp target map(to: a, b) map(from: sum)
!$omp parallel do default(none) shared(a, b, sum)
do i = 1, size
sum(i) = a(i) + b(i)
end do
!$omp end parallel do
!$omp end target
end subroutine foo

Since the array bounds are not known by the compiler, the code might not work as expected (e.g., copying only the array descriptors instead of the actual data), or even raise an error during compilation:

$ gfortran --version
GNU Fortran (Debian 12.2.0-14) 12.2.0
$ gfortran foo.f90 -fopenmp
foo.f90:8:23:

8 | !$omp target map(to: a, b) map(from: sum)
| 1
Error: Assumed size array ‘a’ in MAP clause at (1)
foo.f90:8:25:

8 | !$omp target map(to: a, b) map(from: sum)
| 1
Error: Assumed size array ‘b’ in MAP clause at (1)
foo.f90:8:39:

8 | !$omp target map(to: a, b) map(from: sum)
| 1
Error: Assumed size array ‘sum’ in MAP clause at (1)

Specifying the array bounds is as simple as updating the OpenMP map clauses as follows:

subroutine foo(a, b, sum, size)
implicit none
integer, dimension(*), intent(in) :: a, b
integer, dimension(*), intent(out) :: sum
integer, intent(in) :: size
integer :: i

!$omp target map(to: a(1:size), b(1:size)) map(from: sum(1:size))
!$omp parallel do default(none) shared(a, b, sum)
do i = 1, size
sum(i) = a(i) + b(i)
end do
!$omp end parallel do
!$omp end target
end subroutine foo
note

Another option would be to use assumed-shape arrays instead, which automatically provide the compiler with all the necessary information.

Check the PWR070 entry for more details on them!

References