In this article, we will look into the code generated by the C++ compiler and understand whats happening under the hood.
So here is the program which we will work on:
int main() {
int x = 5;
x = x + 1;
int y = x;
y = x + 7;
}
This program does nothing interesting but we can learn a lot if we start looking into the generated assembly.
I'll use godbolt's Compiler Explorer to quickly get the generated assembly for x86_64
.
What I got from x86_64 gcc 14.1
with no optimization enabled is:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 5
add DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4]
sub eax, 1
mov DWORD PTR [rbp-8], eax
mov eax, DWORD PTR [rbp-4]
add eax, 7
mov DWORD PTR [rbp-8], eax
mov eax, 0
pop rbp
ret
x86_64 Linux Assembly Basics
I'll quickly give a walk through of assembly instructions used here:
-
push A
: PushesA
into the stack. -
mov D, S
: Copy the content atS
toD
-
add A, B
: Add the content atA
andB
and stores the result intoA
. -
sub A, B
: Get the difference fromA
toB
and store the result inA
. -
pop B
: Pop an element from the stack and store intoB
. -
ret
: Return from the function (We'll look into what exactly it means later.)
If we ignore all the nitty-gritty here and look at the fourth line of the generated assembly, i.e.,
mov DWORD PTR [rbp-4], 5
We don't know what DWORD PTR [rbp-4]
is, but with the knowledge we have gained so far. We are pretty sure
that it is copying the value 5
somewhere (we'll look exactly where later) lets say to [rbp-4]
for now.
It is exactly what the following C++ code is expected to do:
int x = 5; // Write the value 5 to some memory location and name it x.
Lets move on to the next assembly instruction:
add DWORD PTR [rbp-4], 1
We again ignore DWORD PTR [rbp-4]
for now. But we are pretty sure it is adding the value 1
with the content
at [rbp-4]
(We exactly don't know what it is) and stores back the value at [rbp-4]
. Previously we stored the value 5
at [rbp-4]
now we added 1
to it. So it must store the value 6
now. It is exactly what the following C++ code is
expected to do:
x = x + 1; // add 1 to the value of x and store the result in x itself.
Next assembly instruction we look at is:
mov eax, DWORD PTR [rbp-4]
This is again a move instruction. We still don't know what DWORD PTR [rbp-4]
means and we also don't know what eax
means
but we are pretty sure we are copying the value at [rbp-4]
to eax
. If it is, eax
now holds the value 6
. We haven't exactly
written the equivalent C++ line for this generated assembly, this is what compiler felt the need to generate itself.
The next instruction is:
sub eax, 1
We yet don't know what eax
is but we are sure, we are subtracting the value at eax
which is 6
by 1
, so eax
now holds 5
.
To summarize previous assembly instructions:
-
[rbp-4]
holds the value our variablex
is expected to hold, i.e.,6
-
Copy the value at
[rbp-4]
toeax
. -
Subtracted the value at
eax
by one.
So what we can conclude is we have the value x-1
at eax
.
Looks like our Compiler is preparing for the following C++ code.
int y = x - 1;
It has already obtained the value for x-1
and stored at eax
. It now just need to copy this value at eax
to some location
which it does using:
mov DWORD PTR [rbp-8], eax
We exactly don't know what DWORD PTR [rbp-8]
does, but we are here copying the value at eax
which holds x-1
to some location [rbp-8]
.
The upcoming instructions are similar to the previous ones:
mov eax, DWORD PTR [rbp-4]
add eax, 7
mov DWORD PTR [rbp-8], eax
This simply achieves what is expected from the following C++ code:
y = x + 7;
To Summarize:
-
[rbp-4]
holds the value,x
is expected to store. -
Copy the value at [rbp-4], i.e., value of
x
toeax
. -
Add
7
to the value ateax
,eax
now holds the value ofx + 7
. -
Copy the value at
eax
(x + 7)
to some location[rbp-8]
.
Too much of abstraction so far. Lets dive somewhat deeper now.
Storage Organization
Every program run in its own address space, which can be divided in the following way.
------------------------
| Code |
------------------------
| Static |
------------------------
| Heap |
------------------------
| Free Memory |
------------------------
| Stack |
------------------------
Code
The generated code is kept in this region. It's size is fixed.
Static
Things like global constants are placed in the region called Static.
Heap
Heap is dynamic in nature. It is used to manage data that may outlive the call to procedure(function) that created it. Heaps grow downwards i.e, towards the higher memory address.
Stack
Stack is also dynamic in nature. It is used to store data structures called activation record(also known as stack frame or call frame) that are generated during function call. Stack grows upwards i.e., towards lower memory address. This concept is necessary when we start talking about rbp
.
Whenever a procedure is called, space for the variables are created in the stack, i.e., push operations are performed for the local variables. When the procedure terminates the space is popped off the stack. This should be enough to justify the statement that, stack is dynamic.
CPU Registers
Registers are memory unit within the CPU. Since they are within the CPU (processor) they are very quick to access. The properties of the registers are processor dependent. Anyways we will look into very few registers used by the compiler in the previously generated code.
-
eax
: 32 bit general purpose register. -
rsp
: Stack Pointer Register. Points to the current top of the stack. It stores the address of the current top of stack. -
rbp
: Base Pointer Register. Used as a reference pointer during function call.
Now we have all the basics required to look into the entire generated assembly.
Revisiting the generated assembly
I'll start by assuming that the main function has already been called.
When the first instruction is executed
push rbp
the stack looks like the following:
| |
-----------------------
| rbp | <- rsp
-----------------------
| |
-----------------------
| |
Fig: The current stack status.
The next instruction:
mov rbp, rsp
copies the value of rsp
into rbp
. What this means is, rbp
now holds the address of the current top of stack.
The next instruction:
mov DWORD PTR [rbp-4], 5
puts the value 5
to a memory location that is 4
bytes below the address currently stored by rbp
.
| |
----------------------
| 5 |
----------------------
| rbp | <-rsp
----------------------
| |
----------------------
| |
Fig: The current stack status after first move.
The next instruction then add the value 1
to the content at [rbp-4]
.
add DWORD PTR [rbp-4], 1
The stack now looks like:
| |
----------------------
| 6 |
----------------------
| rbp | <-rsp
----------------------
| |
Fig: The current stack status after the first add operation.
The next instruction copies the value at [rbp-4]
into the eax register.
mov eax, DWORD PTR [rbp-4]
| |
---------------------- -----
| 6 | |eax|
---------------------- -----
| rbp | <-rsp | 6 |
---------------------- -----
| |
Fig: The current stack status and the value at eax register.
The next instruction:
sub eax, 1
subtracts the value at eax
by 1
and stores it back into eax
.
| |
---------------------- -----
| 6 | |eax|
---------------------- -----
| rbp | <-rsp | 5 |
---------------------- -----
| |
Fig: The current stack status and the value at eax register.
The next instruction:
mov DWORD PTR [rbp-8], eax
copies the value at eax
into the memory location 8
bytes below the address at rbp
.
| |
----------------------
| 5 |
---------------------- -----
| 6 | |eax|
---------------------- -----
| rbp | <-rsp | 5 |
---------------------- -----
| |
Fig: The current stack status and the value at eax register.
The next instruction:
mov eax, DWORD PTR [rbp-4]
copies the value at the location 4
bytes below the address stored by the rbp
register.
| |
----------------------
| 5 |
---------------------- -----
| 6 | |eax|
---------------------- -----
| rbp | <-rsp | 6 |
---------------------- -----
| |
Fig: The current stack status and the value at eax register.
The next instruction:
add eax, 7
adds the value 7
to the value stored at the eax
register.
| |
----------------------
| 5 |
---------------------- -------
| 6 | | eax |
---------------------- -------
| rbp | <-rsp | 13 |
---------------------- -------
| |
Fig: The current stack status and the value at eax register.
Similarly the next instruction:
mov DWORD PTR [rbp-8], eax
copies the value at eax
into [rbp-8]
.
| |
----------------------
| 13 |
---------------------- -------
| 6 | | eax |
---------------------- -------
| rbp | <-rsp | 13 |
---------------------- -------
| |
Fig: The current stack status and the value at eax register.
The next instruction:
mov eax, 0
sets the value 0
at the eax
register.
| |
----------------------
| 13 |
---------------------- -------
| 6 | | eax |
---------------------- -------
| rbp | <-rsp | 0 |
---------------------- -------
| |
Fig: The current stack status and the value at eax register.
The next instruction :
pop rbp
pops the value from the stack and store it back into rbp
. The pop operation moves the rsp
downwards.
| |
----------------------
| 13 |
---------------------- -------
| 6 | | eax |
---------------------- -------
| rbp | | 0 |
---------------------- -------
| | <-rsp
----------------------
| |
Fig: The current stack status and the value at eax register.
The last instruction:
ret
returns from the function. It basically sets the execution to the line after the function call.
This is it for this article. We'll look into more Compiler Stuffs in the next article.
I write articles and courses for Programiz.pro. Have a look at the Master C++ Course
Happy Learning!