Introduction
In this post, I will try to write down the steps of C program execution on x86. I used to believe that all C programs start execution at main
, or at least this was my understanding from different books/courses until my best friend gdb
debugger showed the symbol for _start
. This is how I got curious until I got to the bottom of it. Below are my notes that I took during my learning.
Execution Steps
- The linker inject
_start
which is called in the process of loading.- It is written in assembly language
- Always placed at the beginning of the
.text
section -> Always guaranteed to run before anything else - It sets up some registers and arguments and calls
__start
which is called__libc_start_main
__libc_start_main
is written in C that:- function prototype:
(int (*main) (int, char **, char **), __libc_start_main int argc, char *argv, int (*init) (int, char **, char **), void (*fini) (void), void (*rtld_fini) (void), void *stack_end )
- Define
environ
global variable usingps_string
:environ = ps_strings->ps_envstr
- Below are some details about
ps_strings
structure:
- Below are some details about
/* * The following structure is found at the top of the user stack of each * user process. The ps program uses it to locate argv and environment * strings. Programs that wish ps to display other information may modify * it; normally ps_argvstr points to argv[0], and ps_nargvstr is the same * as the program's argc. The fields ps_envstr and ps_nenvstr are the * equivalent for the environment. */ struct ps_strings { char **ps_argvstr; /* first of 0 or more argument strings */ int ps_nargvstr; /* the number of argument strings */ char **ps_envstr; /* first of 0 or more environment strings */ int ps_nenvstr; /* the number of environment strings */ };
- It is typically defined as char
envp = argv[argc + 1]
inlibc_init_first
- It also registers cleanup and exit handlers
- It define
init
&fini
that defines function prolog and epilogue which means defining what happens when calling a function and when returning from a function. They also align the stack to be multiple of 16 bytes so it is more efficient and cache friendly. They are written in assembly language - It sets %rbp to zero because
main
would be the outermost frame - Finally it calls:
(main(ps_strings->ps_nargvstr, ps_strings->ps_argvstr, environ)); exit
- After the NULL of
envp
, there is ELF auxiliary vector that the loader uses to provide information to the process such as user id and page size etc. - Therefore,
__libc_start_main
in general does the following:- Set up argv and envp
- Initialize the thread local storage by calling
__pthread_initialize_minimal
(which only calls__libc_setup_tls
).__libc_setup_tls
will initialize Thread Control Block and Dynamic Thread Vector. - Set up the thread stack guard
- Register the destructor (i.e. the rtld_fini argument passed to
__libc_start_main
) of the dynamic linker (by calling__cxa_atexit
) if there is any - Initialize Glibc itself by calling
__libc_init_first
- Register
__libc_csu_fini
(i.e. the fini argument passed to__libc_start_main
) using__cxa_atexit
- Call
__libc_csu_init
(i.e. the init argument passed to__libc_start_main
).__libc_csu_init
execute them in the following order:- Function pointers in .preinit_array section
- Functions marked as
__attribute__ ((constructor))
, via_init
- Function pointers in
.init_array
section
- Set up data structures needed for thread unwinding/cancellation
- Call main of user’s program.
- Call
exit
- In reverse order, functions registered via
atexit
oron_exit
- Function pointers in
.fini_array
section, via__libc_csu_fini
- Functions marked as
__attribute__ ((destructor))
, via__libc_csu_fini
(which calls_fini
after Step 2) - stdio cleanup functions
- The
.fini_array
section must also contain function pointers and the prototype is like the destructor, i.e. taking no arguments and returning void. If the program exits normally, then the exit function (Glibc source file stdlib/exit.c)
- In reverse order, functions registered via
Conclusion
So starting program will call execve
that starts the loader that at some point pass control to _start
, which calls __libc_start_main
which calls __libc_csu_init
which calls _init
.