PWR013: Avoid copying unused variables to or from the GPU
Issue
Unused variables should never be copied to or from the GPU to prevent unnecessary data movements between the CPU and the GPU, which impacts performance.
Actions
Remove the unused variables from the data mapping clauses.
Relevance
One of the key challenges when offloading work to the GPU is minimizing the data transfers between CPU memory and GPU memory. These transfers can greatly affect performance and should be minimized. Thus, only the strictly required data should be copied to or from the GPU memory.
Code example
C
In the following example, matrix B
is copied to the GPU even when it is not
used:
void example(double *A, double *B, double *C) {
#pragma omp target teams distribute parallel for schedule(auto) shared(A, B) \
map(to: A[0:100], B[0:100]) map(tofrom: C[0:100])
for (int i = 0; i < 100; i++) {
C[i] += A[i];
}
}
This can be easily corrected by removing references to B from all the clauses:
void example(double *A, double *B, double *C) {
#pragma omp target teams distribute parallel for schedule(auto) shared(A) \
map(to: A[0:100]) map(tofrom: C[0:100])
for (int i = 0; i < 100; i++) {
C[i] += A[i];
}
}
Fortran
In the following example, matrix B
is copied to the GPU even when it is not
used:
subroutine example(A, B, C)
implicit none
integer, intent(in) :: A(:), B(:)
integer, intent(inout) :: C(:)
integer :: i
!$omp target teams distribute parallel do schedule(auto) default(none) &
!$omp& shared(A, B, C) private(i) map(to: A, B) map(tofrom: C)
do i = 1, size(C, 1)
C(i) = C(i) + A(i)
end do
!$omp end target teams distribute parallel do
end subroutine example
This can be easily corrected by removing references to B from all the clauses:
subroutine example(A, B, C)
implicit none
integer, intent(in) :: A(:), B(:)
integer, intent(inout) :: C(:)
integer :: i
!$omp target teams distribute parallel do schedule(auto) default(none) &
!$omp& shared(A, C) private(i) map(to: A) map(tofrom: C)
do i = 1, size(C, 1)
C(i) = C(i) + A(i)
end do
!$omp end target teams distribute parallel do
end subroutine example