About Aligned Memory Allocation

Origin

I suffered a big loss on this thing. As for what the loss was, I won’t go into details. In short, when I first saw this thing, I didn’t understand it thoroughly, so I’m taking this opportunity to pay off my debt. I want to use a super plain way to make people understand quickly, so next time I encounter this issue, I can quickly recall it.

What is Aligned Memory Allocation?

Imagine a situation where the program runs to a point where I need some extra memory to store new data. In C language, we will definitely use malloc() to ask the system for extra memory space.

For example, I suddenly need 32 bytes of space, so I use malloc(), and then the system looks around and finally finds 32 bytes of space in the memory and returns the starting memory address. This starting position might be very random, like 0x08004321, but it doesn’t matter. Anyway, I now have 32 bytes of space, and I can do whatever I want with it.

However, in some cases (I’ll talk about which cases later), we would hope that the starting position of the memory we get is “aligned”. So what is alignment?

For example, if you use some Hex file preview tools, you will see that the starting position on the left is neatly arranged at 32 bytes intervals, like this:

1
2
3
4
5
6
7
8
9
General Random Allocation (Unaligned):
0x08004321  |  Data...
            ^ Starting position is random

32-byte Alignment (Aligned):
0x08000000  |  Data...
0x08000020  |  Data...  <-- 0x20 is 32
0x08000040  |  Data...  <-- 0x40 is 64
            ^ Starting positions are all divisible by 32

So-called Aligned Memory Allocation means that when I ask for new memory space, I want to get a memory space whose starting position can be divisible by 32 (or other numbers).

OK, so up to this point, you should have some concept of what Aligned Memory Allocation is. That’s half the battle. Because most people hearing about this for the first time probably find it hard to imagine what it is, or know what it is but don’t know what it’s used for.

So, what is the use of this?

Why is Alignment Needed?

The need for Alignment is mostly due to hardware design constraints, for example:

Vector instruction sets (such as AVX), when executing such instruction sets, will require the target memory position to be aligned. If it is not aligned, it has to be split into two steps to handle, or an error will occur directly. Why not design the instruction set to read memory starting from any position? I think it’s mostly a cost consideration. If it can be solved on the driver side, why waste expensive hardware space?

Others like CUDA, when fetching data via PCIe on GPU, using DMA will also require memory to be aligned. Or when GPU instructions fetch data from its own memory, it will also require alignment. In addition, when using DMA on single chips like STM32, the memory position of the buffer will also be required to be aligned so that it can be correctly written or read by DMA.

At this time, someone will definitely jump out and say: “I usually use CUDA or use DMA on STM32 and haven’t encountered this problem!”

That’s because the CUDA driver or HAL library etc. will handle it for you, so of course you don’t feel it. Conversely, if you are writing a driver or doing bare metal development today, you must be careful about memory alignment, otherwise it might not even move.

So how to achieve Alignment?

Having said so much, how can we achieve Alignment during memory allocation?

Imagine that our goal today is to get a new 64 bytes memory, and require 32-byte alignment. If the starting position given by malloc today happens to be 0x08000040, which is the starting position of an alignment block, then we hit the jackpot, but in reality, we are certainly not that lucky. The starting position it gives you might be 0x08000041 (offset by 1 byte), or it might be 0x0800005F (offset by 31 bytes).

But it doesn’t matter. At this time, we will think that as long as the memory malloc’d is long enough, finding the starting point of an Alignment Block from it and starting to store data from there, that also counts as alignment.

So the question returns to, how long memory should we ask for? Observing the previous example, if doing 32 bytes alignment, the offset amount is at most 0 ~ 31 bytes. So the space we want to malloc() must be at least size + alignment - 1.

Wait, something seems missing? If so, how can we free those spaces before the alignment starting address? We need to record the starting address originally malloc’d, so we need one more position to store this pointer. This required space is sizeof(void*).

In summary, the length we really want to malloc() is: size + alignment - 1 + sizeof(void*)

Memory Structure Diagram:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Actual malloc'd block (Original)
|
v
+-------------+---------------------+-------------------------+
| padding ... | Original Pointer    | User Pointer (Aligned)  |
|             | (Stored)            |                         |
+-------------+---------------------+-------------------------+
              ^                     ^
              |                     |
              This slot stores      Fits 32-byte alignment
              original address      We return this address

Then we just need to find the alignment starting address inside, and store the “original starting address” in the slot before the alignment start. This way, when freeing memory, just look one slot backward from the alignment address, and we can find the memory starting point we should free.

The principle concept is just like this. So let’s see how to implement it.

C Implementation

The following takes 32-byte alignment as an example. You can change 32 to any positive integer alignment you want (usually a power of 2).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <stdlib.h>
#include <stdint.h> // for uintptr_t

void* aligned_malloc(size_t size) {
    // 1. Calculate the total space needed:
    //    size + adjustment space of (alignment - 1) + space to store original pointer
    void *original = malloc(size + 32 - 1 + sizeof(void*));
    
    if (!original) return NULL;

    // 2. Calculate the alignment starting position
    //    First reserve a pointer space for us to store the address
    uintptr_t raw = (uintptr_t)original + sizeof(void*);
    
    //    Use Bitwise operation for alignment
    //    Principle: (address + mask) & ~mask
    //    Note: Here use ~(32 - 1) i.e. ~31 to mask off the last five bits to achieve 32 alignment
    uintptr_t aligned = (raw + (32 - 1)) & ~(32 - 1);
    
    //    Actually can also use (void*)(original - (original % 32)); to calculate
    //    Just that using bit manipulation feels more elegant

    // 3. Store the original malloc() starting address in the slot before the alignment address
    ((void**)aligned)[-1] = original;

    // 4. Finally return the alignment start
    return (void*)aligned;
}

void aligned_free(void *ptr) {
    if (!ptr) return;
    
    // When freeing, find the original malloc address stored one slot before the alignment address
    void *original = ((void**)ptr)[-1];
    
    // Then just free it
    free(original);
}

Conclusion

Anyway, that’s it. Aligned Memory Allocation is important and has some implementation details. Taking this opportunity, I have figured it out. Even if I forget it in the future, I can quickly come back to review it.