Saturday 25 March 2017

What happens when a program is executed in Linux?

1. Shell forks and execs the new process. Shell waits for the completion of the process using wait4() unless forked process is a daemon (is a session leader and kills the parent and continues with the child after exec)
2. fork() actually creates clone of the shell process itself. This address space is overwritten by exec call.  Exec call starts off by reading the first few pages of the executable from the disk. This operation involves file system, block layer, page cache and the device driver (if backend is disk, then disk device driver).
3.  Kernel needs to be compiled with different binary file handlers. Some handlers present in Linux kernel are ELF, a.out, script  etc.  Header of the executable says what kind of executable it is.  If kernel is compiled with the right kind of binary format handler, then the read data is handed over to the handler.
4. For example, if handler is an ELF handler, then the header has the location of the text in the file. Few pages of the text are read from disk  (basically mmapped) along with all the directly linked libraries.  Now the process is ready to start as its text and required libraries are in memory. Rest of the text is read using demand paging when needed.


examples:

strace ./a.out
execve("./a.out", ["./a.out"], [/* 45 vars */]) = 0
brk(0)                                  = 0x1056000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff4ce4a9000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=88130, ...}) = 0
mmap(NULL, 88130, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff4ce493000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P \2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1840928, ...}) = 0
mmap(NULL, 3949248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff4cdec4000
mprotect(0x7ff4ce07e000, 2097152, PROT_NONE) = 0
mmap(0x7ff4ce27e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ba000) = 0x7ff4ce27e000
mmap(0x7ff4ce284000, 17088, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff4ce284000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff4ce492000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff4ce490000
arch_prctl(ARCH_SET_FS, 0x7ff4ce490740) = 0
mprotect(0x7ff4ce27e000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7ff4ce4ab000, 4096, PROT_READ) = 0
munmap(0x7ff4ce493000, 88130)           = 0
exit_group(4195565)                     = ?
+++ exited with 237 +++ 


Implementation of execve

•      The entry point of the system call is the architecture-dependent sys_execve function. This function quickly delegates its work to the system-independent do_execve routine.
•      int do_execve(char * filename, char __user *__user *argv, char __user *__user *envp,struct pt_regs * regs)
•      Filename
•      * argv  (ex: ls –l /usr/bin/)
•      *envp :environment of the program
•      The notation is slightly clumsy because argv and envp are arrays of pointers, and both the pointer to the array itself as well as all pointers in the array are located in the userspace portion of the virtual address space.
the code flow diagram for do_execve
•      bprm_init then handles several administrative tasks
–     mm_alloc generates a new instance of mm_struct to manage the process address space.
–     init_new_context is an architecture-specific function that initializes the instance.
–     __bprm_mm_init sets up an initial stack.
•      prepare_binprm is used to supply a number of parent process values (above all, the effective UID and GID)
•      search_binary_handler is used at the end of do_execve to find a suitable binary format for the particular file.
•      Binary format handler performs the following actions:
–     It releases all resources used by the old process.
–     It maps the application into virtual address space.
–     The instruction pointer of the process and some other architecture-specific registers are set so that the main function of the program is executed when the scheduler selects the process.
Interpreting Binary Formats
<binfmts.h>
struct linux_binfmt
struct linux_binfmt * next;
struct module *module;
int (*load_binary)(struct linux_binprm *, struct pt_regs * regs);
int (*load_shlib)(struct file *);
int (*core_dump)(long signr, struct pt_regs * regs, struct file * file);
unsigned long min_coredump; /* minimal dump size */
};
1.       load_binary to load normal programs.
1.       load_shlib to load a shared library, that is, a dynamic library.
1.       core_dump to write a core dump if there is a program error.
Exiting Processes
•      Processes must terminate with the exit system call.
•      The entry point for this call is the sys_exit function that requires an error code as its parameter in order to exit the process.
•      Its implementation is not particularly interesting because it immediately delegates its work to do_exit.

call flow:
                                              do_execve()     //./fs/exec.c
                                                    |
                                    do_execve_common(filename, argv, envp, regs); 
                                                   |
                                                    --->file = open_exec(filename);
                                                    ---> sched_exec();
                                      bprm_mm_init(bprm) 
                                                 |
                                       bprm->mm = mm = mm_alloc();
                                                                         |
                                                                         --->alloctae_mm()
                                                                        ---->mm_init(mm,current)
                                                                                 |
                                                                                mm_alloc_pgd(mm)
                                                                                  |
                                                                                 --->pgd_alloc()

                                            |
                                   init_new_context(current, mm);  //Arch specific -->arch/arm/mm/context.c
                                            |
                                   __bprm_mm_init(bprm);
                                            |
                                           ---> insert_vm_struct(mm, vma)
                                     |
                                   prepare_binprm(bprm);
                   |
                  search_binary_handler(bprm,regs);     //./fs/exec.c

No comments:

Post a Comment