Process Segments and VMA
Process Segments:
1. Introduction
Traditionally, a Unix process is divided into segments. The standard segments are code segment, data segment, BSS (block started by symbol), and stack segment.The code segment contains the binary code of the program which is running as the process (a "process" is a program in execution).
The data segment contains the initialized global variables and data structures.
The BSS segment contains the uninitialized global data structures and finally,
the stack segment contains the local variables, return addresses, etc. for the particular process.
Under Linux, a process can execute in two modes - user mode and kernel mode. A process usually executes in user mode, but can switch to kernel mode by making system calls. When a process makes a system call, the kernel takes control and does the requested service on behalf of the process. The process is said to be running in kernel mode during this time. When a process is running in user mode, it is said to be "in userland" and when it is running in kernel mode it is said to be "in kernel space". We will first have a look at how the process segments are dealt with in userland and then take a look at the bookkeeping on process segments done in kernel space.
2. Userland's view of the segments
The code segment consists of the code - the actual executable program. The code of all the functions we write in the program resides in this segment. The addresses of the functions will give us an idea where the code segment is. If we have a function
foo()
and let x
be the address of
foo
(x = &foo;
). we know that
x
will point within the code segment.The Data segment consists of the initialized global variables of a program. The Operating system needs to know what values are used to initialize the global variables. The initialized variables are kept in the data segment. To get the address of the data segment we declare a global variable and then print out its address. This address must be inside the data segment.
The BSS consists of the uninitialized global variables of a process. To get an address which occurs inside the BSS, we declare an uninitialized global variable, then print its address.
The automatic variables (or local variables) will be allocated on the stack, so printing out the addresses of local variables will provide us with the addresses within the stack segment.
3. A C program
Let's have a look at the following C program:1 #include 2 #include 3 #include 4 #include 5 6 int our_init_data = 30; 7 int our_noinit_data; 8 9 void our_prints(void) 10 { 11 int our_local_data = 1; 12 printf("\nPid of the process is = %d", getpid()); 13 printf("\nAddresses which fall into:"); 14 printf("\n 1) Data segment = %p", 15 &our_init_data); 16 printf("\n 2) BSS segment = %p", 17 &our_noinit_data); 18 printf("\n 3) Code segment = %p", 19 &our_prints); 20 printf("\n 4) Stack segment = %p\n", 21 &our_local_data); 22 23 while(1); 24 } 25 26 int main() 27 { 28 our_prints(); 29 return 0; 30 }
We can see that lines 6 and 7 declare two global variables. One is initialized and one is uninitialized. Per the previous discussion, the initialized variable will fall into the data segment and the uninitialized variable will fall into the BSS segment. Lines 14-17 print the addresses of the variables.
We also know that the address of the function
our_prints
will fall into the code segment, so that if
we print the address of this function, we will get a value which
falls into the code segment. This is done in lines 18-19.Finally we print the address of a local variable. This automatic variable's address will be within the stack segment.
venkat@pari-ubuntu:test$ ./a.out
Pid of the process is = 3441
Addresses which fall into:
1) Data segment = 0x601028
2) BSS segment = 0x601040
3) Code segment = 0x400564
4) Stack segment = 0x7fff0193afec
4. Execution of a userland program
When we execute a userland program, similar to the one given above, what happens is that the shell willfork()
and
exec()
the new program. The exec()
code
inside the kernel will figure out what format the binary is in
(ELF, a.out, etc.) and will call the
corresponding handler for that format. For example when an ELF
format file is loaded, the function load_elf_binary()
from fs/binfmt_elf.c
takes care of initializing the
kernel data structures for the particular process. Details of this
portion of loading will not be dealt with here, as that in itself
is a topic for another article :-) The point here is that the code
which loads the executable into the kernel fills in the kernel
data structures.5. Memory-related data structures in the kernel
In the Linux kernel, every process has an associatedstruct task_struct
. The definition of this struct is
in the header file include/linux/sched.h
. The
following snippet is from the 2.6.10 Linux kernel source code (only
the needed fields and a few nearby fields are shown):struct task_struct {
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
atomic_t usage;
...
...
...
struct mm_struct *mm, *active_mm;
...
...
...
pid_t pid;
...
...
...
char comm[16];
...
...
};
Three members of the data structure are relevant to us:
pid
contains the Process ID of the process.comm
holds the name of the process.- The
mm_struct
within thetask_struct
is the key to all memory management activities related to the process.
mm_struct
is defined in
include/linux/sched.h
as:
struct mm_struct {
struct vm_area_struct * mmap; /* list of VMAs */
struct rb_root mm_rb;
struct vm_area_struct * mmap_cache; /* last find_vma result */
...
...
...
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
...
...
...
};
No comments:
Post a Comment