Article 1: ===http://www.unixwiz.net/techtips/win32-callconv-asm.html===
Intel x86 Function-call Conventions - Assembly View
Article 2: ====http://haoxiai.net/bianchengyuyan/cyuyan/11726.html======
Article 3: ===http://book.51cto.com/art/200801/63515.htm====
there is some difference from the diagram in Chapter 1. if the original is correct, then difference lies on who will contain the pushed ebp, previous frame or the new frame, or push ebp doesn't change esp at all. ( refer to the colored text in chapter 1).
the local variants start from -8%(ebp) instead of -4(%ebp), I think the description in Article 1 is correct, at least the stack frame structure:
Intel x86 Function-call Conventions - Assembly View
One of the "big picture" issues in looking at compiled C code is the function-calling conventions. These are the methods that a calling function and a called function agree on how parameters and return values should be passed between them, and how the stack is used by the function itself. The layout of the stack constitutes the "stack frame", and knowing how this works can go a long way to decoding how something works.
In C and modern CPU design conventions, the stack frame is a chunk of memory, allocated from the stack, at run-time, each time a function is called, to store its automatic variables. Hence nested or recursive calls to the same function, each successively obtain their own separate frames.
Physically, a function's stack frame is the area between the addresses contained in esp, the stack pointer, and ebp, the frame pointer (base pointer in Intel terminology). Thus, if a function pushes more values onto the stack, it is effectively growing its frame.
This is a very low-level view: the picture as seen from the C/C++ programmer is illustrated elsewhere:
• Unixwiz.net Tech Tip: Intel x86 Function-call Conventions - C Programmer's View
For the sake of discussion, we're using the terms that the Microsoft Visual C compiler uses to describe these conventions, even though other platforms may use other terms.
__cdecl (pronounced
rhymes with "heckle")
- This convention is the most common because it supports semantics required by the C language. The C language supports variadic functions (variable argument lists, alá printf), and this means that the caller must clean up the stack after the function call: the called function has no way to know how to do this. It's not terribly optimal, but the C language semantics demand it. __stdcall
- Also known as __pascal, this requires that each function take a fixed number of parameters, and this means that the called function can do argument cleanup in one place rather than have this be scattered throughout the program in every place that calls it. The Win32 API primarily uses __stdcall.
It's important to note that these are merely conventions, and any collection of cooperating code can agree on nearly anything. There are other conventions (passing parameters in registers, for instance) that behave differently, and of course the optimizer can make mincemeat of any clear picture as well.
Our focus here is to provide an overview, and not an authoritative definition for these conventions.
Register use in the stack frame
In both __cdecl and __stdcall conventions, the same set of three registers is involved in the function-call frame:
%ESP - Stack Pointer
- This 32-bit register is implicitly manipulated by several CPU instructions (PUSH, POP, CALL, and RET among others), it always points to the last element used on the stack (not the first free element): this means that the PUSH and POP operations would be specified in pseudo-C as:
*--ESP = value; // push
value = *ESP++; // pop
- The "Top of the stack" is an occupied location, not a free one, and is at the lowest memory address. %EBP - Base Pointer
- This 32-bit register is used to reference all the function parameters and local variables in the current stack frame. Unlike the %esp register, the base pointer is manipulated only explicitly. This is sometimes called the "Frame Pointer". %EIP - Instruction Pointer
- This holds the address of the next CPU instruction to be executed, and it's saved onto the stack as part of the CALL instruction. As well, any of the "jump" instructions modify the %EIP directly.
Assembler notation
Virtually everybody in the Intel assembler world uses the Intel notation, but the GNU C compiler uses what they call the "AT&T syntax" for backwards compatibility. This seems to us to be a really dumb idea, but it's a fact of life.
There are minor notational differences between the two notations, but by far the most annoying is that the AT&T syntax reverses the source and destination operands. To move the immediate value 4 into the EAX register:
mov $4, %eax // AT&T notation
mov eax, 4 // Intel notation
More recent GNU compilers have a way to generate the Intel syntax, but it's not clear if the GNU assembler takes it. In any case, we'll use the Intel notation exclusively.
There are other minor differences that are not of much concern to the reverse engineer.
Calling a __cdecl function
The best way to understand the stack organization is to see each step in calling a function with the __cdecl conventions. These steps are taken automatically by the compiler, and though not all of them are used in every case (sometimes no parameters, sometimes no local variables, sometimes no saved registers), but this shows the overall mechanism employed.
Push parameters onto the stack, from right to left
- Parameters are pushed onto the stack, one at a time, from right to left. Whether the parameters are evaluated from right to left is a different matter, and in any case this is unspecified by the language and code should never rely on this. The calling code must keep track of how many bytes of parameters have been pushed onto the stack so it can clean it up later. Call the function
- Here, the processor pushes contents of the %EIP (instruction pointer) onto the stack, and it points to the first byte after the CALL instruction. After this finishes, the caller has lost control, and the callee is in charge. This step does not change the %ebp register. Save and update the %ebp
- Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.
push ebp
- Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old instruction pointer. Save CPU registers used for temporaries
- If this function will use any CPU registers, it has to save the old values first lest it walk on data used by the calling functions. Each register to be used is pushed onto the stack one at a time, and the compiler must remember what it did so it can unwind it later. Allocate local variables
- The function may choose to use local stack-based variables, and they are allocated here simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks.
- Now, the local variables are located on the stack between the %ebp and %esp registers, and though it would be possible to refer to them as offsets from either one, by convention the %ebp register is used. This means that -4(%ebp) refers to the first local variable. Perform the function's purpose
- At this point, the stack frame is set up correctly, and this is represented by the diagram to the right. All the parameters and locals are offsets from the %ebp register:
16(%ebp) - third function parameter 12(%ebp) - second function parameter 8(%ebp) - first function parameter 4(%ebp) - old %EIP (the function's "return address") 0(%ebp) - old %EBP (previous function's base pointer) -4(%ebp) - first local variable -8(%ebp) - second local variable -12(%ebp) - third local variable - The function is free to use any of the registers that had been saved onto the stack upon entry, but it must not change the stack pointer or all Hell will break loose upon function return. Release local storage
- When the function allocates local, temporary space, it does so by decrementing from the stack point by the amount space needed, and this process must be reversed to reclaim that space. It's usually done by adding to the stack pointer the same amount which was subtracted previously, though a series of POP instructions could achieve the same thing. Restore saved registers
- For each register saved onto the stack upon entry, it must be restored from the stack in reverse order. If the "save" and "restore" phases don't match exactly, catastrophic stack corruption will occur. Restore the old base pointer
- The first thing this function did upon entry was save the caller's %ebp base pointer, and by restoring it now (popping the top item from the stack), we effectively discard the entire local stack frame and put the caller's frame back in play. Return from the function
- This is the last step of the called function, and the RET instruction pops the old %EIP from the stack and jumps to that location. This gives control back to the calling function. Only the stack pointer and instruction pointers are modified by a subroutine return. Clean up pushed parameters
- In the __cdecl convention, the caller must clean up the parameters pushed onto the stack, and this is done either by popping the stack into don't-care registers (for a few parameters) or by adding the parameter-block size to the stack pointer directly.
__cdecl -vs- __stdcall
The __stdcall convention is mainly used by the Windows API, and it's a bit more compact than __cdecl. The main difference is that any given function has a hard-coded set of parameters, and this cannot vary from call to call like it can in C (no "variadic functions").
Because the size of the parameter block is fixed, the burden of cleaning these parameters off the stack can be shifted to the called function, instead of being done by the calling function as in __cdecl. There are several effects of this:
- the code is a tiny bit smaller, because the parameter-cleanup code is found once — in the called function itself — rather than in every place the function is called. These may be only a few bytes per call, but for commonly-used functions it can add up. This presumably means that the code may be a tiny bit faster as well.
- calling the function with the wrong number of parameters is catastrophic - the stack will be badly misaligned, and general havoc will surely ensue.
As an offshoot of #2, Microsoft Visual C takes special care of functions that are B{__stdcall}. Since the number of parameters is known at compile time, the compiler encodes the parameter byte count in the symbol name itself, and this means that calling the function wrong leads to a link error.
For instance, the function int foo(int a, int b) would generate — at the assembler level — the symbol "_foo@8", where "8" is the number of bytes expected. This means that not only will a call with 1 or 3 parameters not resolve (due to the size mismatch), but neither will a call expecting the __cdecl parameters (which looks for _foo). It's a clever mechanism that avoids a lot of problems.
Variations and Notes
The x86 architecture provides a number of built-in mechanisms for assisting with frame management, but they don't seem to be commonly used by C compilers. Of particular interest is the ENTER instruction, which handles most of the function-prolog code.
ENTER 10,0 PUSH ebp
MOV ebp, esp
SUB esp, 10
We're pretty sure these are functionally equivalent, but our 80386 processor reference suggests that the ENTER version is more compact (6 bytes -vs- 9) but slower (15 clocks -vs- 6). The newer processors are probably harder to pin down, but somebody has probably figured out that ENTER is slower. Sigh.
对每一位孜孜不倦的程序员来说,栈已深深的烙在其脑海中,甚至已经发生变异。栈可以用来传递函数参数、存储局部变量、以及存储返回值的信息、还可以用于保存 寄存器的值以供恢复之用。 在X86平台上(又称之为IA32),应用程序借用栈来支持函数(又称为过程)调用,变量的存储按后进先出(LIFO)的方式进行。 一、 栈帧布局 在具体讲解函数调用之前,我们先来明确栈的几个概念:满栈与空栈,升序栈与降序栈。 满栈是指栈指针指向上次写的最后一个数据单元,而空栈的栈指针指向第一个空闲单元。一个降序栈是在内存中反向增长(就是从应用程序空间结束处开始反向增 长),而升序栈在内存中正向增长。 RISC机器使用传统的满降序栈(FD Full Descending)。如果使用符合IA32规定的编译器,它通常把你的栈指针设置在应用程序空间的结束处并接着使用一个满降序栈。用来存放一个函数的 局部变量、参数、返回地址和其它临时变量的栈区域称为栈帧(stack frame),如图1所示。 图 1 栈帧结构 栈帧布局的设计要考虑到指令集的体系结构特征和被编译的程序设计语言的特征。但是,计算机的制造者常常规定一种用于其体系结构的“标准”栈帧布局,以便被 所有的程序设计语言编译器采纳。这种栈帧布局对于某些特定的程序设计语言或编译器可能并不是最方便的,但是通过这种“标准”布局,用不同程序设计语言编写 的函数得以相互调用。当P调用Q时,Q的参数是放在P的帧中的。另外,当P调用Q时, P中的下一条指令地址将被压入栈中,形成P的栈帧的末尾,具体可参见 图1,返回地址就是当程序从Q返回时应该继续执行的地方。 Q的栈帧从保存帧指针的位置开始,后面开始保存其他寄存器的值。 Q也会用栈帧来保存其他不能存放 在寄存器中的局部变量。如果函数要返回整数或指针的话,常用寄存器%eax来保存返回值。当程序执行时,栈指针是可以移动的,因此大多数信息的访问都是相 对于帧指针(%ebp)的。 二、 寄存器使用惯例假设函数P(……)调用函数Q(a1,……,an),我们称P是调用者(caller),Q是被调用者(callee)。如果必须被调用者保存和恢复的寄存器,我们称之为 调用者保护的寄存器(caller-save);如果是被调用者的责任,则称之为 被调用者保护的寄存器(callee- save)。程序寄存器组是唯一一个被所有函数共享的资源。虽然在给定时刻只能有一个函数是活动的,但是我们必须保证当一个函数调用另一个函数时,被调用 者不会覆盖某个调用者稍后会使用的寄存器的值。为此,任何一个平台都会制订一套标准,让所有的函数都必须遵循,包括程序库中的函数。但在大多数计算机系统 结构中,调用者保护的寄存器和被调用者保护的寄存器的概念并不是由硬件来实现的,而是机器参考手册中规定的一种约定。比如,在ARM体系平台中,所有的函 数调用必须遵守ARM 过程调用标准(APCS,ARM Procedure Call Standard)。该标准提供了一套紧凑的代码编写机制,定义的函数可以与其他语言编写的函数交织在一起。其他函数可以编译自 C、 Pascal、也可以是用汇编语言写成的函数。同理,IA32平台也采用了一套统一的寄存器使用惯例。根据惯例,寄存器%eax、%edx、%ecx被划 分为调用者保存。当函数P(调用者)调用Q(被调用者)时,Q可以覆盖这些寄存器的值,而不会破坏任何P所需要的数据。另 外,%ebx、%esi、%edi、%ebp被划分为被调用者保存,这意味着Q必须在覆盖他们之前,将这些寄存器的值保存到栈中,并在返回前恢复他们。 三、 参数传递惯例大 约在1960年之前,参数传递不是通过栈来传递的,而是通过一块静态分配的存储空间来传递的,这种方法阻碍了递归函数的使用。从20世纪70年代开始,大 多数调用约定函数参数的传递通过栈来实现(因为访问寄存器比访问存储器要快的多),同时也会导致一些不必要的存储器访问。对实际程序的研究表明,很少有函 数的参数个数是超过4个,并且极少有6个的。因此,现代计算机中的参数传递约定都规定,一个函数的前k个参数(典型的,k=4或者k=6)放在寄存器中传 递,剩余的参数则放在存储器中传递。在ARM体系平台中,APCS就明确规定:1) 前 4 个整数实参(或者更少!)被装载到 R0 – R4寄存器中。 2) 前 4 个浮点实参(或者更少!)被装载到 f0 - f3寄存器中。 3) 其他任何实参(如果有的话)存储在内存中,用进入函数时紧接在栈指针所指向的空间。换句话说,其余的参数被压入栈顶。但在IA32平台上,参数传递不是完 全通过寄存器来实现的,而是通过栈帧来实现的。根据不同的调用方式,参数在栈帧的存放方式又有一点差别,区别如下表所示:
调用方式 | 参数在堆栈里的次序 | 操作方式 |
_cdecl | 第一个参数在低位地址 | 调用者 |
_stdcall | 第一个参数在低位地址 | 被调用者 |
_fastcall | 编译器指定 | 被调用者 |
_pascal | 第一个参数在高位地址 | 被调用者 |
文件名:arch/i386/kernel/entry.S(说明:前面的数字表示行号) 359 ALIGN 360 common_interrupt: 361 SAVE_ALL 362 movl %esp,%eax 363 call do_IRQ 364 jmp ret_from_intr |
文件名:arch/i386/kernel/irq.c 48 fastcall unsigned int do_IRQ(struct pt_regs *regs) 49 { 50 /* high bits used in ret_from_ code */ //取得中断向量号 51 int irq = regs->orig_eax & 0xff; 52 #ifdef CONFIG_4KSTACKS 53 union irq_ctx *curctx, *irqctx; 54 u32 *isp; 55 #endif …… 107 } |
文件名:arch/i386/kernel/entry.S | 文件名:include/asm-i386/ptrace.h |
84 #define SAVE_ALL 85 cld; 86 pushl %es; 87 pushl %ds; 88 pushl %eax; 89 pushl %ebp; 90 pushl %edi; 91 pushl %esi; 92 pushl %edx; 93 pushl %ecx; 94 pushl %ebx; 95 movl $(__USER_DS), %edx; 96 movl %edx, %ds; 97 movl %edx, %es; | 26 struct pt_regs { 27 long ebx; 28 long ecx; 29 long edx; 30 long esi; 31 long edi; 32 long ebp; 33 long eax; 34 int xds; 35 int xes; 36 long orig_eax; 37 long eip; 38 int xcs; 39 long eflags; 40 long esp; 41 int xss; 42 }; |
48 fastcall unsigned int do_IRQ(struct pt_regs *regs) 49 { …… 73 #ifdef CONFIG_4KSTACKS …… 92 asm volatile( 93 " xchgl %%ebx,%%esp n" 94 " call __do_IRQ n" 95 " movl %%ebx,%%esp n" 96 : "=a" (arg1), "=d" (arg2), "=b" (ebx) 97 : "0" (irq), "1" (regs), "2" (isp) 98 : "memory", "cc", "ecx" 99 ); …… 101 #endif |
#include<stdio.h> low_to_up(char in); void main() { printf("%cn",low_to_up('d')); } low_to_up(char in) { char ch; if(in>='a' && in<='z') ch=in-'a'+'A'; else return(ch); } |
1: #include<stdio.h> 2: low_to_up(char in); 3: 4: void main() 5: { 00401020 push ebp 00401021 mov ebp,esp 00401023 sub esp,40h 00401026 push ebx 00401027 push esi 00401028 push edi 00401029 lea edi,[ebp-40h] 0040102C mov ecx,10h 00401031 mov eax,0CCCCCCCCh 00401036 rep stos dword ptr [edi] 6: printf("%cn",low_to_up('d')); 00401038 push #64h d的ASC码 (1处) 0040103A call @ILT+5(low_to_up) (0040100a)
00401083 sub esp,44h 00401086 push ebx 00401087 push esi 00401088 push edi 00401089 lea edi,[ebp-44h] 0040108C mov ecx,11h 00401091 mov eax,0CCCCCCCCh 00401096 rep stos dword ptr [edi] 11: char ch; 12: if(in>='a' && in<='z') 00401098 movsx eax,byte ptr [ebp+8] # (2处) 0040109C cmp eax,61h 0040109F jl low_to_up+36h (004010b6) 004010A1 movsx ecx,byte ptr [ebp+8] 004010A5 cmp ecx,7Ah 004010A8 jg low_to_up+36h (004010b6)
13: ch=in-'a'+'A'; 004010AA movsx edx,byte ptr [ebp+8] # (3处) 004010AE sub edx,20h 004010B1 mov byte ptr [ebp-4],dl 14: else 004010B4 jmp low_to_up+3Ah (004010ba)15: return(ch); 004010B6 movsx eax,byte ptr [ebp-4] 16: } 004010BA pop edi # 恢复寄存器的值,做返回处理 (7处) 004010BB pop esi 004010BC pop ebx 004010BD mov esp,ebp 004010BF pop ebp 004010C0 ret |
.file "csdn.c" .text
subl $8, %esp movl 8(%ebp), %eax # (2处) movb %al, -1(%ebp) cmpb $96, -1(%ebp) jle .L2 cmpb $122, -1(%ebp) jg .L2 movzbl -1(%ebp), %eax subb $32, %al
movl -8(%ebp), %eax # (4处) leave ret .size low_to_up, .-low_to_up .section .rodata .LC0: .string "%cn" .text .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax subl %eax, %esp movl $100, (%esp) #将d的值压入到栈中,然后调用low_to_up()函数 (1处) call low_to_up movl %eax, 4(%esp) # (6处) movl $.LC0, (%esp) call printf movl $0, %eax leave ret .size main, .-main .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.3.5 (Debian 1:3.3.5-13)" |
low_to_up: pushl %ebp movl %esp, %ebp subl $8, %esp movl 8(%ebp), %eax movb %al, -1(%ebp) cmpb $96, -1(%ebp) jle .L2 cmpb $122, -1(%ebp) jg .L2 movzbl -1(%ebp), %eax subb $32, %al movb %al, -2(%ebp) jmp .L3 .L2: movsbl -2(%ebp),%eax movl %eax, -8(%ebp) jmp .L1 .L3: movsbl -2(%ebp),%eax movl %eax, -8(%ebp) .L1: movl -8(%ebp), %eax leave ret |
以上就是文艺毛衣为你收集整理的Stack Frame and Function CallRegister use in the stack frameAssembler notationCalling a __cdecl function__cdecl -vs- __stdcallVariations and Notes的全部内容,希望文章能够帮你解决Stack Frame and Function CallRegister use in the stack frameAssembler notationCalling a __cdecl function__cdecl -vs- __stdcallVariations and Notes所遇到的程序开发问题。
发表评论 取消回复