Skip to main content

PWR079: Avoid undefined behavior due to uninitialized variables

Issue

Accessing an uninitialized variable can lead to undefined behavior due to its indeterminate value.

Actions

Always initialize variables before using them, helping ensure deterministic behavior and prevent bugs in the code.

Relevance

Some programming languages automatically initialize variables to default values, but Fortran, C, and C++ often do not. Consequently, variables left uninitialized contain unpredictable data until they are explicitly set by the programmer:

  • Fortran: Reading any variable that has not been explicitly initialized results in undefined behavior.

  • C: Automatic (i.e., declared within functions) variables are not initialized by default. However, static, thread_local, and file-scope variables are zero-initialized (e.g., scalar values are set to 0).

  • C++: Similar to C, with additional rules for default initialization of class-like objects and array elements in certain contexts.

Since uninitialized variables may contain any arbitrary values, reading and using them can lead to undefined behavior, potentially causing incorrect results, crashes, or other unintended outcomes. Compilers are not required to warn about these issues and, even if they do, they typically still allow the code to compile and run.

Some compilers may appear to "help" by zero-initializing variables under certain conditions (e.g., specific compilation flags). While this can make technically incorrect code run as originally intended, relying on such incidental behavior creates a false sense of security and masks underlying logical errors. Ultimately, this can vanish under different compilation settings and can vary between compilers.

Lastly, while some compilers provide options to automatically initialize certain data types (e.g., gfortran's -finit-integer=<value> or gcc's -ftrivial-auto-var-init=<value>), these features reduce portability to other development environments and hide problems in the code rather than addressing them.

Code examples

C

Consider the following code, which aims to sum the elements of an array:

// example_array.c
#include <stdio.h>

__attribute__((pure)) double sum_array(double *array, size_t size) {
double sum;

for (size_t i = 0; i < size; ++i) {
sum += array[i];
}

return sum;
}

int main() {
double array[] = {0.24, 0.33, 0.17, 0.89, 0.05};
printf("Sum is: %f\n", sum_array(array, 5));

return 0;
}

Note how sum, an automatic variable, is never explicitly initialized. Although it might seem like it "should" logically start at 0, the C standard does not guarantee this. Thus, the initial value of sum is indeterminate, leading to different outcomes depending on the compiler and its settings:

  • For instance, gcc -O2 appears to start sum at 0, allowing the program to work as intended:
$ gcc --version
gcc (Debian 14.2.0-8) 14.2.0
$ gcc -O2 example_array.c -o example_array_gcc
$ ./example_array_gcc
Sum is: 1.680000
  • However, with clang -O2, sum appears to contain arbitrary data, leading to incorrect results:
$ clang --version
Debian clang version 19.1.5 (1)
$ clang -O2 example_array.c -o example_array_clang
$ ./example_array_clang
Sum is: nan

The solution is straightforward, always initialize variables before using them:

double sum = 0.0;

This principle applies to all variable types, including other elemental types like int, struct elements, and pointers.

Pointers are particularly important in C, as they are commonly used to represent n-dimensional arrays. However, uninitialized pointers can easily lead to invalid memory accesses and program crashes. Let's consider another example code with pointers, where a computational function ensures the received pointers are valid before accessing their contents:

// example_matrix.c
#include <stdio.h>

void perform_computation(double *matrix_a, double *matrix_b) {
if (matrix_a == NULL || matrix_b == NULL) {
printf("A matrix is NULL; skipping computation\n");
return;
}

printf("Performing computation...\n");
}

int main() {
double *matrix_a, *matrix_b;
perform_computation(matrix_a, matrix_b);

return 0;
}

Note how matrix_a and matrix_b are never initialized. Although this example may seem trivial, similar scenarios can occur in large, complex codebases where pointers traverse multiple functions and conditional logic, making such issues hard to diagnose and correct.

Since the contents of the pointers are indeterminate, we can obtain different results depending on the compiler and settings:

  • gcc -O2 appears to set the pointers to NULL, preventing the computation:
$ gcc -O2 example_matrix.c -o example_matrix_gcc
$ ./example_matrix_gcc
A matrix is NULL; skipping computation
  • However, with clang -O2, the pointers seem to hold arbitrary values, allowing the computation to proceed and likely crash due to invalid memory accesses later on:
$ clang -O2 example_matrix.c -o example_matrix_clang
$ ./example_matrix_clang
Performing computation...

To help prevent these types of issues, it's a good practice to initialize pointers to NULL by default if they aren't assigned at their declaration:

double *matrix_a = NULL, *matrix_b = NULL;

Fortran

Consider the following code, which aims to sum the elements of an array:

! example_array.f90
program main
use iso_fortran_env, only: real32
implicit none

real(kind=real32) :: array(5)
array = [0.24, 0.33, 0.17, 0.89, 0.05]

print *, "Sum is:", sum_array(array)

contains

pure real(kind=real32) function sum_array(array)
implicit none
real(kind=real32), intent(in) :: array(:)
real(kind=real32) :: sum
integer :: i

do i = 1, size(array, 1)
sum = sum + array(i)
end do

sum_array = sum
end function sum_array

end program main

Note how sum is never explicitly initialized. Although it might seem like it "should" logically start at 0, the Fortran standard does not guarantee this. Thus, the initial value of sum is indeterminate, leading to different outcomes depending on the compiler and its settings:

  • For instance, gfortran -O2 appears to start sum at 0, allowing the program to work as intended:
$ gfortran --version
GNU Fortran (Debian 14.2.0-8) 14.2.0
$ gfortran -O2 example_array.f90 -o example_array_gfortran
$ ./example_array_gfortran
Sum is: 1.67999995
  • However, with flang -O2, sum appears to contain arbitrary data, leading to incorrect results:
$ flang-new --version
Debian flang-new version 19.1.5 (1)
$ flang-new -O2 example_array.f90 -o example_array_flang
$ ./example_array_flang
Sum is: NaN

The solution is straightforward, always initialize variables before using them:

real(kind=real32) :: sum

sum = 0

This principle applies to all variable types, including other elemental types like integer, derived types, and arrays.

Arrays are a critical part of simulation codes. For managing dynamic, n-dimensional arrays in Fortran, both pointer and allocatable variables are available. The latter, introduced in Fortran 2003, are generally safer and more robust. Unlike pointer, variables with the allocatable attribute automatically free their memory and are always set by default to the unallocated state.

For example, the following code technically leads to undefined behavior because a pointer is used without explicit initialization:

program main
implicit none
integer, pointer :: array(:)

if (.not. associated(array)) then
print *, "Undefined behavior"
end if
end program main

In contrast, an allocatable array can always be safely checked using the allocated function, even when not explicitly initialized:

program main
implicit none
integer, allocatable :: array(:)

if (.not. allocated(array)) then
print *, "Defined behavior"
end if
end program main

While these examples may seem trivial, these types of issues can arise in large, complex codebases where variables traverse multiple procedures and are subject to intricate conditional logic.

If, for any reason, you still need to use pointer variables, it's a good practice to nullify them at their declaration for additional safety:

integer, pointer :: array(:) => NULL()

References