
Introduction
In this post, I will try to write down the steps of C program execution on x86. I used to believe that all C programs start execution at main, or at least this was my understanding from different books/courses until my best friend gdb debugger showed the symbol for _start. This is how I got curious until I got to the bottom of it. Below are my notes that I took during my learning.
Execution Steps
- The linker inject
_startwhich is called in the process of loading.- It is written in assembly language
- Always placed at the beginning of the
.textsection -> Always guaranteed to run before anything else - It sets up some registers and arguments and calls
__startwhich is called__libc_start_main
__libc_start_mainis written in C that:- function prototype:
__libc_start_main (int (*main) (int, char **, char **), int argc, char *argv, int (*init) (int, char **, char **), void (*fini) (void), void (*rtld_fini) (void), void *stack_end )- Define
environglobal variable usingps_string:environ = ps_strings->ps_envstr- Below are some details about
ps_stringsstructure:
- Below are some details about
/* * The following structure is found at the top of the user stack of each * user process. The ps program uses it to locate argv and environment * strings. Programs that wish ps to display other information may modify * it; normally ps_argvstr points to argv[0], and ps_nargvstr is the same * as the program's argc. The fields ps_envstr and ps_nenvstr are the * equivalent for the environment. */ struct ps_strings { char **ps_argvstr; /* first of 0 or more argument strings */ int ps_nargvstr; /* the number of argument strings */ char **ps_envstr; /* first of 0 or more environment strings */ int ps_nenvstr; /* the number of environment strings */ };- It is typically defined as char
envp = argv[argc + 1]inlibc_init_first - It also registers cleanup and exit handlers
- It define
init&finithat defines function prolog and epilogue which means defining what happens when calling a function and when returning from a function. They also align the stack to be multiple of 16 bytes so it is more efficient and cache friendly. They are written in assembly language - It sets %rbp to zero because
mainwould be the outermost frame - Finally it calls:
exit(main(ps_strings->ps_nargvstr, ps_strings->ps_argvstr, environ));- After the NULL of
envp, there is ELF auxiliary vector that the loader uses to provide information to the process such as user id and page size etc. - Therefore,
__libc_start_mainin general does the following:- Set up argv and envp
- Initialize the thread local storage by calling
__pthread_initialize_minimal(which only calls__libc_setup_tls).__libc_setup_tlswill initialize Thread Control Block and Dynamic Thread Vector. - Set up the thread stack guard
- Register the destructor (i.e. the rtld_fini argument passed to
__libc_start_main) of the dynamic linker (by calling__cxa_atexit) if there is any - Initialize Glibc itself by calling
__libc_init_first - Register
__libc_csu_fini(i.e. the fini argument passed to__libc_start_main) using__cxa_atexit - Call
__libc_csu_init(i.e. the init argument passed to__libc_start_main).__libc_csu_initexecute them in the following order:- Function pointers in .preinit_array section
- Functions marked as
__attribute__ ((constructor)), via_init - Function pointers in
.init_arraysection
- Set up data structures needed for thread unwinding/cancellation
- Call main of user’s program.
- Call
exit- In reverse order, functions registered via
atexitoron_exit - Function pointers in
.fini_arraysection, via__libc_csu_fini - Functions marked as
__attribute__ ((destructor)), via__libc_csu_fini(which calls_finiafter Step 2) - stdio cleanup functions
- The
.fini_arraysection must also contain function pointers and the prototype is like the destructor, i.e. taking no arguments and returning void. If the program exits normally, then the exit function (Glibc source file stdlib/exit.c)
- In reverse order, functions registered via
Conclusion
So starting program will call execve that starts the loader that at some point pass control to _start, which calls __libc_start_main which calls __libc_csu_init which calls _init.