Understanding VLA

Scattered notes from learning about the implementation of VLA.

These views do not in any way represent those of NVIDIA or any other organization or institution that I am professionally associated with. These views are entirely my own.

What is VLA?

Variable-length arrays are dynamic, stack-allocated arrays. The compiler needs to increase the stack size in the current stack frame to allocate enough space for the array. Assuming negative stack-growth like on x86, the compiler will decrease the stack pointer sufficiently to store the array.

This is almost identical to alloca. Both alloca and VLAs are essentially primitives to modify the stack pointer.

Eg:

  // Subtracts N from current stack pointer returns sp 
  int *curr_sp = alloca(N * sizeof(int));

  // equivilant to
  int curr_sp[N];

One key difference between the two:

The memory alloca() returns is valid as long as the current function persists. The lifetime of the memory occupied by a VLA is valid as long as the VLA’s identifier remains in scope. You can alloca memory in a loop for example and use the memory outside the loop, a VLA would be gone because the identifier goes out of scope when the loop terminates.

Memory Layout

Because the stack grows down on most platforms, the stack pointer after an alloca or VLA allocation but arrays are addressed sequentially upwards, the address of the first element of a VLA array (or the pointer returned by alloca) will be the value of the stack pointer after it’s modified.

Element 0 of the array or alloca-allocated memory is therefore immediately above the stack pointer after allocation, and is addressed by increasing sequentially until the end of the array. Accessing past the array will then run into previously declared stack variables.

When the function returns, the stack space will be available for subsequent function calls to use automatically, so there is no need to explicitly free memory allocated by VLA/alloca.

Examples

GCC docs:

These arrays are declared like any other automatic arrays, but with a length that is not a constant expression. The storage is allocated at the point of declaration and deallocated when the block scope containing the declaration exits.

// ./vla <size>
int main(int argc, char** argv) {
  int len = atoi(argv[1]);
  int array[len];
  for (int i=0; i<len; i++)
    array[i] = i;
}

Declaring the array decrements the stack pointer enough to provide memory for the array:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

#define SAVESTACK(X) asm( "mov %%rsp, %0" : "=rm" ( X ));

int main(int argc, char** argv) {
  int len = atoi(getenv("LEN"));
  int idx = atoi(getenv("IDX"));
  register uint64_t sp0, sp1;

  SAVESTACK(sp0);

  int vla[len];

  SAVESTACK(sp1);

  for (int i=0; i < len; i++)
    vla[i] = i;

  printf("&vla[0]: %ld\nbefore: %ld\nafter: %ld\ndiff: %ld\n", (uint64_t)&vla[0], sp0, sp1, sp0-sp1);
  return vla[idx];
}

$ uname -a
Linux carbon 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc inspect-stack-vla.c && LEN=10 IDX=4 ./a.out
&vla[0]: 140737151458112
before: 140737151458160
after: 140737151458112
diff: 48

Notice that the address stored in the stack pointer after declaring the VLA array is the same as the address of the first element of the VLA array as depicted in the diagram above.

`alloca`

Instead of declaring a VLA array, we can create a pointer to memory allocated by alloca to produce the same effect:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <alloca.h>

#define SAVESTACK(X) asm( "mov %%rsp, %0" : "=rm" ( X ));

int main(int argc, char** argv) {
  int len = atoi(getenv("LEN"));
  int idx = atoi(getenv("IDX"));
  register uint64_t sp0, sp1;

  SAVESTACK(sp0);

  // int vla[len];
  int* vla = alloca(len * sizeof(int));

  SAVESTACK(sp1);

  for (int i=0; i < len; i++)
    vla[i] = i;

  printf("&vla[0]: %ld\nbefore: %ld\nafter: %ld\ndiff: %ld\n", (uint64_t)&vla[0], sp0, sp1, sp0-sp1);
  return vla[idx];
}

$ gcc inspect-stack-alloca.c && LEN=10 IDX=4 ./a.out
&vla[0]: 140728646054592
before: 140728646054640
after: 140728646054592
diff: 48

Compare the GCC docs for alloca with that of variable length arrays and notice the similarities:

The function alloca supports a kind of half-dynamic allocation in which blocks are allocated dynamically but freed automatically.

Allocating a block with alloca is an explicit action; you can allocate as many blocks as you wish, and compute the size at run time. But all the blocks are freed when you exit the function that alloca was called from, just as if they were automatic variables declared in that function. There is no way to free the space explicitly.

Why Might This Be a Bad Idea?

The dynamic nature of VLAs means the offset of stack variables declared after the VLA into the stack frame of the function is also dynamic - which means the function will need extra instructions to calculate the address of these variables whenever they are referenced in the body of the function.

This may be a worthwhile tradeoff, but know that use of VLAs means your code may need a few extra instructions every time you use stack variables.

Conclusion & Links

These views do not in any way represent those of NVIDIA or any other organization or institution that I am professionally associated with. These views are entirely my own.

Written on Jun 1st, 2023