概述
文章目录
- 1. 简介
- 2. 包装器
- 2.1. 汇编系统调用
- 2.2. 宏系统调用
- 2.3. 定制系统调用
- 3. 汇编系统调用详解
- 3.1. syscalls.list
- 3.2. assembly syscall wrappers
- 3.3. syscall-template.S
- 3.3.1. PSEUDO
- 3.3.1.1. ENTRY
- 3.3.1.2. DO_CALL
- 3.3.1.3. SYSCALL_ERROR_LABEL
- 3.3.2. PSEUDO_END
- 3.3.2.1. END
- 4. 宏系统调用详解
- 4.1. clock_gettime
- 4.2. __clock_gettime64
- 4.2.1. INTERNAL_SYSCALL_CALL
- 4.2.2. INTERNAL_SYSCALL
- 5. 同名c文件使用问题
- 6. 内核中系统调用
- 6.1. syscall.tbl
- 6.2. syscalls 头文件
- 6.3. SYSCALL_DEFINE 宏
- 7. 参考资料
说明:本文所使用的代码为glibc的master分支代码,版本>2.33。
1. 简介
系统调用是操作系统内核提供一系列具备预定功能的函数接口供应用程序调用。系统调用把应用程序的请求传给内核,内核调用相应的函数完成所需的处理,再将处理结果返回给应用程序。
应用程序运行在用户态下,其诸多操作都受到限制。而系统调用是运行在内核态的,那么运行在用户态的应用程序如何运行内核态的代码呢?操作系统一般是通过中断来从用户态切换到内核态的。
中断分为硬件中断和软件中断。其中软件中断通常是一条指令,使用这条指令用户可以手动触发某个中断。中断一般有两个属性,一个是中断号,一个是中断处理程序。不同的中断有不同的中断号,每个中断号都对应了一个中断处理程序。中断号是有限的,所以不会用一个中断来对应一个系统调用。对于每个系统调用都有一个系统调用号,在触发中断之前,会将系统调用号放入到一个固定的寄存器,中断处理程序会读取该寄存器的值,然后决定执行哪个系统调用的代码。
2. 包装器
wiki主页:https://sourceware.org/glibc/wiki/SyscallWrappers
glibc 对操作系统内核的系统调用使用了三种包装器:汇编、宏和定制。
2.1. 汇编系统调用
glibc 中的简单内核系统调用从名称列表转换为汇编包装器,然后进行编译。
在构建目录中反汇编socket系统调用,将看到syscall-template.S包装器:
maminjie@fedora ~/w/g/build> objdump -ldr socket/socket.o
socket/socket.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <__socket>:
__socket():
/mnt/hgfs/projects/linux/glibc/socket/../sysdeps/unix/syscall-template.S:120
0: b8 29 00 00 00 mov $0x29,%eax
5: 0f 05 syscall
7: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
d: 0f 83 00 00 00 00 jae 13 <__socket+0x13>
f: R_X86_64_PLT32 __syscall_error-0x4
/mnt/hgfs/projects/linux/glibc/socket/../sysdeps/unix/syscall-template.S:122
13: c3 retq
maminjie@fedora ~/w/g/build>
使用包装器的系统调用列表保存在syscalls.list文件中:
maminjie@fedora /m/h/p/l/glibc (master)> find . -name syscalls.list
./sysdeps/unix/bsd/syscalls.list
./sysdeps/unix/syscalls.list
./sysdeps/unix/sysv/linux/alpha/syscalls.list
./sysdeps/unix/sysv/linux/arc/syscalls.list
./sysdeps/unix/sysv/linux/arm/syscalls.list
./sysdeps/unix/sysv/linux/csky/syscalls.list
./sysdeps/unix/sysv/linux/generic/syscalls.list
./sysdeps/unix/sysv/linux/generic/wordsize-32/syscalls.list
./sysdeps/unix/sysv/linux/hppa/syscalls.list
./sysdeps/unix/sysv/linux/i386/syscalls.list
./sysdeps/unix/sysv/linux/ia64/syscalls.list
./sysdeps/unix/sysv/linux/m68k/syscalls.list
./sysdeps/unix/sysv/linux/microblaze/syscalls.list
./sysdeps/unix/sysv/linux/mips/mips32/syscalls.list
./sysdeps/unix/sysv/linux/mips/mips64/n32/syscalls.list
./sysdeps/unix/sysv/linux/mips/mips64/n64/syscalls.list
./sysdeps/unix/sysv/linux/mips/syscalls.list
./sysdeps/unix/sysv/linux/powerpc/powerpc32/syscalls.list
./sysdeps/unix/sysv/linux/s390/s390-32/syscalls.list
./sysdeps/unix/sysv/linux/sh/syscalls.list
./sysdeps/unix/sysv/linux/sparc/sparc32/syscalls.list
./sysdeps/unix/sysv/linux/sparc/sparc64/syscalls.list
./sysdeps/unix/sysv/linux/syscalls.list
./sysdeps/unix/sysv/linux/wordsize-64/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/x32/syscalls.list
sysdep 目录排序有助于决定哪些系统调用被应用。因此,例如在 x86_64 上,以下文件将被应用:
./sysdeps/unix/sysv/linux/syscalls.list
./sysdeps/unix/sysv/linux/wordsize-64/syscalls.list
./sysdeps/unix/sysv/linux/x86_64/syscalls.list
处理系统调用包装器的 makefile 规则在sysdeps/unix/Makefile 中,例如:
...
ifndef avoid-generated
$(common-objpfx)sysd-syscalls: $(..)sysdeps/unix/make-syscalls.sh
$(wildcard $(+sysdep_dirs:%=%/syscalls.list))
$(wildcard $(+sysdep_dirs:%=%/arch-syscall.h))
$(common-objpfx)libc-modules.stmp
for dir in $(+sysdep_dirs); do
test -f $$dir/syscalls.list &&
{ sysdirs='$(sysdirs)'
asm_CPP='$(COMPILE.S) -E -x assembler-with-cpp'
$(SHELL) $(dir $<)$(notdir $<) $$dir || exit 1; };
test $$dir = $(..)sysdeps/unix && break;
done > $@T
mv -f $@T $@
endif
...
syscalls.list 文件由名为 sysdeps/unix/make-syscalls.sh 的脚本处理,该脚本的注释描述了 syscalls.list 文件的格式。
该脚本使用名为syscall-template.S 的模板生成汇编文件,该文件使用特定于机器的宏来构建系统调用的包装器。机器可以用自己的副本覆盖syscall-template.S,因为它也是根据 sysdep 目录顺序选择的。
最后,每台机器的宏由sysdep.h头文件提供:
maminjie@fedora /m/h/p/l/glibc (master)> find . -name sysdep.h
./sysdeps/aarch64/sysdep.h
./sysdeps/arc/sysdep.h
./sysdeps/arm/sysdep.h
./sysdeps/csky/sysdep.h
./sysdeps/generic/sysdep.h
./sysdeps/hppa/sysdep.h
./sysdeps/i386/sysdep.h
./sysdeps/ia64/sysdep.h
./sysdeps/m68k/coldfire/sysdep.h
./sysdeps/m68k/m680x0/sysdep.h
./sysdeps/m68k/sysdep.h
./sysdeps/mach/i386/sysdep.h
./sysdeps/mach/sysdep.h
./sysdeps/microblaze/sysdep.h
./sysdeps/nios2/sysdep.h
./sysdeps/powerpc/powerpc32/sysdep.h
./sysdeps/powerpc/powerpc64/sysdep.h
./sysdeps/powerpc/sysdep.h
./sysdeps/s390/s390-32/sysdep.h
./sysdeps/s390/s390-64/sysdep.h
./sysdeps/sh/sysdep.h
./sysdeps/sparc/sysdep.h
./sysdeps/unix/arm/sysdep.h
./sysdeps/unix/i386/sysdep.h
./sysdeps/unix/mips/mips32/sysdep.h
./sysdeps/unix/mips/mips64/sysdep.h
./sysdeps/unix/mips/sysdep.h
./sysdeps/unix/powerpc/sysdep.h
./sysdeps/unix/sh/sysdep.h
./sysdeps/unix/sysdep.h
./sysdeps/unix/sysv/linux/aarch64/sysdep.h
./sysdeps/unix/sysv/linux/alpha/sysdep.h
./sysdeps/unix/sysv/linux/arc/sysdep.h
./sysdeps/unix/sysv/linux/arm/sysdep.h
./sysdeps/unix/sysv/linux/csky/sysdep.h
./sysdeps/unix/sysv/linux/generic/sysdep.h
./sysdeps/unix/sysv/linux/hppa/sysdep.h
./sysdeps/unix/sysv/linux/i386/sysdep.h
./sysdeps/unix/sysv/linux/ia64/sysdep.h
./sysdeps/unix/sysv/linux/m68k/coldfire/sysdep.h
./sysdeps/unix/sysv/linux/m68k/m680x0/sysdep.h
./sysdeps/unix/sysv/linux/m68k/sysdep.h
./sysdeps/unix/sysv/linux/microblaze/sysdep.h
./sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
./sysdeps/unix/sysv/linux/mips/mips64/sysdep.h
./sysdeps/unix/sysv/linux/mips/sysdep.h
./sysdeps/unix/sysv/linux/nios2/sysdep.h
./sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
./sysdeps/unix/sysv/linux/powerpc/sysdep.h
./sysdeps/unix/sysv/linux/riscv/sysdep.h
./sysdeps/unix/sysv/linux/s390/s390-32/sysdep.h
./sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h
./sysdeps/unix/sysv/linux/s390/sysdep.h
./sysdeps/unix/sysv/linux/sh/sh4/sysdep.h
./sysdeps/unix/sysv/linux/sh/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sparc32/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sparc64/sysdep.h
./sysdeps/unix/sysv/linux/sparc/sysdep.h
./sysdeps/unix/sysv/linux/sysdep.h
./sysdeps/unix/sysv/linux/x86_64/sysdep.h
./sysdeps/unix/sysv/linux/x86_64/x32/sysdep.h
./sysdeps/unix/x86_64/sysdep.h
./sysdeps/x86/sysdep.h
./sysdeps/x86_64/sysdep.h
./sysdeps/x86_64/x32/sysdep.h
maminjie@fedora /m/h/p/l/glibc (master)>
所有这些部分一起产生一个包装器的编译,类似这样:
…
(echo ‘#define SYSCALL_NAME socket’;
echo ‘#define SYSCALL_NARGS 3’;
echo ‘#define SYSCALL_SYMBOL __socket’;
echo ‘#define SYSCALL_CANCELLABLE 0’;
echo ‘#define SYSCALL_NOERRNO 0’;
echo ‘#define SYSCALL_ERRVAL 0’;
echo ‘#include <syscall-template.S>’;
echo ‘weak_alias (__socket, socket)’;
echo ‘hidden_weak (socket)’;
) | /opt/cross/x86_64-linux-gnu/bin/x86_64-glibc-linux-gnu-gcc -c -I…/include -I/home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket -I/home/azanella/Projects/glibc/build/x86_64-linux-gnu -I…/sysdeps/unix/sysv/linux/x86_64/64 -I…/sysdeps/unix/sysv/linux/x86_64 -I…/sysdeps/unix/sysv/linux/x86 -I…/sysdeps/x86/nptl -I…/sysdeps/unix/sysv/linux/wordsize-64 -I…/sysdeps/x86_64/nptl -I…/sysdeps/unix/sysv/linux/include -I…/sysdeps/unix/sysv/linux -I…/sysdeps/nptl -I…/sysdeps/pthread -I…/sysdeps/gnu -I…/sysdeps/unix/inet -I…/sysdeps/unix/sysv -I…/sysdeps/unix/x86_64 -I…/sysdeps/unix -I…/sysdeps/posix -I…/sysdeps/x86_64/64 -I…/sysdeps/x86_64/fpu/multiarch -I…/sysdeps/x86_64/fpu -I…/sysdeps/x86/fpu/include -I…/sysdeps/x86/fpu -I…/sysdeps/x86_64/multiarch -I…/sysdeps/x86_64 -I…/sysdeps/x86 -I…/sysdeps/ieee754/float128 -I…/sysdeps/ieee754/ldbl-96/include -I…/sysdeps/ieee754/ldbl-96 -I…/sysdeps/ieee754/dbl-64/wordsize-64 -I…/sysdeps/ieee754/dbl-64 -I…/sysdeps/ieee754/flt-32 -I…/sysdeps/wordsize-64 -I…/sysdeps/ieee754 -I…/sysdeps/generic -I… -I…/libio -I. -D_LIBC_REENTRANT -include /home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc-modules.h -DMODULE_NAME=libc -include …/include/libc-symbols.h -DPIC -DSHARED -DTOP_NAMESPACE=glibc -DASSEMBLER -g -Werror=undef -Wa,–noexecstack -o /home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket/socket.os -x assembler-with-cpp - -MD -MP -MF /home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket/socket.os.dt -MT /home/azanella/Projects/glibc/build/x86_64-linux-gnu/socket/socket.os
…
注意-x assembler-with-cpp 的使用,因此这些包装器应该只使用汇编。
注意:GLIBC 2.26 和之前的版本用于通过使用包含所需步骤的宏的辅助头文件sysdep-cancel.h来定义取消系统调用(在nopic中调用__{libc,pthread,librt}_{enable,disable}_asynccancel函数/图片模式)。GLIBC 2.27 及更高版本只需要默认的sysdep.h汇编宏,并且所有取消系统调用都使用 SYSCALL_CANCEL 宏在 C 文件中实现。
2.2. 宏系统调用
宏系统调用由比简单包装器复杂得多的*.c文件处理。
一些系统调用可能需要将内核结果改组(shuffling )到用户空间结构中,因此 glibc 需要一种在 C 代码中进行内联系统调用的方法。
这由sysdep.h文件中定义的宏处理。
这些宏都被称为INTERNAL_和INLINE_,并提供了几个供源代码使用的变体。
例如,可以在wait函数实现 (sysdeps/unix/sysv/linux/wait4.c) 中看到这些宏的使用:
...
pid_t
__wait4_time64 (pid_t pid, int *stat_loc, int options, struct __rusage64 *usage)
{
#ifdef __NR_wait4
# if __KERNEL_OLD_TIMEVAL_MATCHES_TIMEVAL64
return SYSCALL_CANCEL (wait4, pid, stat_loc, options, usage);
# else
pid_t ret;
struct __rusage32 usage32;
ret = SYSCALL_CANCEL (wait4, pid, stat_loc, options,
usage != NULL ? &usage32 : NULL);
if (ret > 0 && usage != NULL)
rusage32_to_rusage64 (&usage32, usage);
return ret;
# endif
#elif defined (__ASSUME_WAITID_PID0_P_PGID)
idtype_t idtype = P_PID;
...
函数__wait4_time64 调用宏SYSCALL_CANCEL,其定义在sysdeps/unix/sysdep.h中,如下所示:
#define SYSCALL_CANCEL(...)
({
long int sc_ret;
if (NO_SYSCALL_CANCEL_CHECKING)
sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__);
else
{
int sc_cancel_oldtype = LIBC_CANCEL_ASYNC ();
sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__);
LIBC_CANCEL_RESET (sc_cancel_oldtype);
}
sc_ret;
})
LIBC_CANCEL_ASYNC调用__ {libc,pthread,librt} _enable_asynccancel
在系统调用之前原子地使能异步取消(cancellation )模式。在另一个句柄LIBC_CANCEL_RESET 中,通过调用__{libc,pthread,librt}_disable_asynccancel
原子地禁用异步取消模式,并根据需要采取相应的行动。
2.3. 定制系统调用
英国(British)术语 “bespoke” 意味着它是根据买方的要求定制或定制的。glibc 中有一些地方进行了系统调用,它们不使用标准汇编或 C 代码宏。
最好的例子是 fork 和 vfork 实现,这需要 Linux 上的特定调用约定,具体取决于体系结构。例如对于 x86_64 (sysdeps/unix/sysv/linux/x86_64/vfork.S):
/* Clone the calling process, but without copying the whole address space.
The calling process is suspended until the new process exits or is
replaced by a call to `execve'. Return -1 for errors, 0 to the new process,
and the process ID of the new process to the old process. */
ENTRY (__vfork)
/* Pop the return PC value into RDI. We need a register that
is preserved by the syscall and that we're allowed to destroy. */
popq %rdi
cfi_adjust_cfa_offset(-8)
cfi_register(%rip, %rdi)
/* Stuff the syscall number in RAX and enter into the kernel. */
movl $SYS_ify (vfork), %eax
syscall
/* Push back the return PC. */
pushq %rdi
cfi_adjust_cfa_offset(8)
cmpl $-4095, %eax
jae SYSCALL_ERROR_LABEL /* Branch forward if it failed. */
#if SHSTK_ENABLED
/* Check if shadow stack is in use. */
xorl %esi, %esi
rdsspq %rsi
testq %rsi, %rsi
/* Normal return if shadow stack isn't in use. */
je L(no_shstk)
testl %eax, %eax
/* In parent, normal return. */
jnz L(no_shstk)
/* NB: In child, jump back to caller via indirect branch without
popping shadow stack which is shared with parent. Keep shadow
stack mismatched so that child returns in the vfork-calling
function will trigger SIGSEGV. */
popq %rdi
cfi_adjust_cfa_offset(-8)
jmp *%rdi
L(no_shstk):
#endif
/* Normal return. */
ret
PSEUDO_END (__vfork)
它使用sysdep.h宏进行函数返回 (SYSCALL_ERROR_LABEL),但是由于一些特定的 ABI 和语义约束,它需要一些特定的汇编实现。
事情的真相是,大多数定制案例可能都应该全部清理以使用宏。
3. 汇编系统调用详解
通过“包装器->汇编系统调用”的介绍,我们知道汇编系统调用主要由三部分组成:make-syscall.sh、syscall-template.S、syscalls.list。其中make-syscall.sh文件是shell脚本文件。该脚本文件读取syscalls.list文件内容,对syscalls.list文件中每一行数据进行解析。syscall-template.S文件是系统调用封装的模板文件,包含了封装代码。
3.1. syscalls.list
下面以sysdeps/unix/syscalls.list为例,来理解syscalls.list的内容:
# File name Caller Syscall name Args Strong name Weak names
accept - accept Ci:iBN __libc_accept accept
access - access i:si __access access
acct - acct i:S acct
adjtime - adjtime i:pp __adjtime adjtime
bind - bind i:ipi __bind bind
chdir - chdir i:s __chdir chdir
...
syscalls.list文件由许多行组成,每一行都对应一个系统调用。每一行可分为6列:
- File name: 生成系统调用目标文件的文件名
- Caller:调用者
- Syscall name:系统调用的名字
- Args:系统调用的参数类型和个数以及返回值的类型
冒号(:)前面表示返回值类型,后面表示参数类型和个数。
系统调用签名前缀:
E: errno 和返回值不是由调用设置
V: errno 未设置,但调用返回 errno 或零(成功)
C: 未知
系统调用签名关键字母:
a:未经检查的地址(例如,mmap的第1个参数)
b:非空缓冲区(例如,read的第2个参数,mmap的返回值)
B:可选的 NULL 缓冲区(例如,getsockopt 的第 4 个参数)
f:2 个整数的缓冲区(例如,socketpair的第4个参数)
F:fcntl的第3个参数
i:标量(任何符号和大小:int、long、long long、enum,等等)
I:ioctl 的第3个参数
n:标量缓冲区长度(例如,read的第3个参数)
N:指向值/返回标量缓冲区长度的指针(例如, recvfrom 的第 6 个参数)
p:指向类型对象的非 NULL 指针(例如,任何非 void* arg)
P:可选的指向类型对象的 NULL 指针(例如,sigaction 的第3个参数)
s:非空字符串(例如,open的第1个参数)
S:可选的 NULL 字符串(例如,acct的第1个参数)
U:unsigned long int(32 位类型零扩展为 64 位类型)
v:vararg 标量(例如,open的可选的第3个参数)
V:每页字节向量(mincore的第3个参数)
W:等待状态,可选的指向 int 的 NULL 指针(例如,wait4 的第2个参数)
(说明:上面释义来自sysdeps/unix/make-syscalls.sh中的注释) - Strong name:系统调用对应函数的名字
- Weak names:系统调用对应函数的名字的别称。可以使用别称来调用函数
3.2. assembly syscall wrappers
再来看看sysdeps/unix/Makefile 中的规则:
...
ifndef avoid-generated
$(common-objpfx)sysd-syscalls: $(..)sysdeps/unix/make-syscalls.sh
$(wildcard $(+sysdep_dirs:%=%/syscalls.list))
$(wildcard $(+sysdep_dirs:%=%/arch-syscall.h))
$(common-objpfx)libc-modules.stmp
for dir in $(+sysdep_dirs); do
test -f $$dir/syscalls.list &&
{ sysdirs='$(sysdirs)'
asm_CPP='$(COMPILE.S) -E -x assembler-with-cpp'
$(SHELL) $(dir $<)$(notdir $<) $$dir || exit 1; };
test $$dir = $(..)sysdeps/unix && break;
done > $@T
mv -f $@T $@
endif
...
该部分在编译时,被解析成如下形式:
touch /home/maminjie/work/glibc/tmp-build/libc-modules.stmp
for dir in /home/maminjie/work/glibc/tmp-build sysdeps/unix/sysv/linux/x86_64/64 sysdeps/unix/sysv/linux/x86_64 sysdeps/unix/sysv/linux/x86 sysdeps/x86/nptl sysdeps/unix/sysv/linux/wordsize-64 sysdeps/x86_64/nptl sysdeps/unix/sysv/linux sysdeps/nptl sysdeps/pthread sysdeps/gnu sysdeps/unix/inet sysdeps/unix/sysv sysdeps/unix/x86_64 sysdeps/unix sysdeps/posix sysdeps/x86_64/64 sysdeps/x86_64/fpu/multiarch sysdeps/x86_64/fpu sysdeps/x86/fpu sysdeps/x86_64/multiarch sysdeps/x86_64 sysdeps/x86 sysdeps/ieee754/float128 sysdeps/ieee754/ldbl-96 sysdeps/ieee754/dbl-64 sysdeps/ieee754/flt-32 sysdeps/wordsize-64 sysdeps/ieee754 sysdeps/generic; do
test -f $dir/syscalls.list &&
{ sysdirs='sysdeps/unix/sysv/linux/x86_64/64 sysdeps/unix/sysv/linux/x86_64 sysdeps/unix/sysv/linux/x86 sysdeps/x86/nptl sysdeps/unix/sysv/linux/wordsize-64 sysdeps/x86_64/nptl sysdeps/unix/sysv/linux sysdeps/nptl sysdeps/pthread sysdeps/gnu sysdeps/unix/inet sysdeps/unix/sysv sysdeps/unix/x86_64 sysdeps/unix sysdeps/posix sysdeps/x86_64/64 sysdeps/x86_64/fpu/multiarch sysdeps/x86_64/fpu sysdeps/x86/fpu sysdeps/x86_64/multiarch sysdeps/x86_64 sysdeps/x86 sysdeps/ieee754/float128 sysdeps/ieee754/ldbl-96 sysdeps/ieee754/dbl-64 sysdeps/ieee754/flt-32 sysdeps/wordsize-64 sysdeps/ieee754 sysdeps/generic'
asm_CPP='gcc -c -Iinclude -I/home/maminjie/work/glibc/tmp-build -Isysdeps/unix/sysv/linux/x86_64/64 -Isysdeps/unix/sysv/linux/x86_64 -Isysdeps/unix/sysv/linux/x86/include -Isysdeps/unix/sysv/linux/x86 -Isysdeps/x86/nptl -Isysdeps/unix/sysv/linux/wordsize-64 -Isysdeps/x86_64/nptl -Isysdeps/unix/sysv/linux/include -Isysdeps/unix/sysv/linux -Isysdeps/nptl -Isysdeps/pthread -Isysdeps/gnu -Isysdeps/unix/inet -Isysdeps/unix/sysv -Isysdeps/unix/x86_64 -Isysdeps/unix -Isysdeps/posix -Isysdeps/x86_64/64 -Isysdeps/x86_64/fpu/multiarch -Isysdeps/x86_64/fpu -Isysdeps/x86/fpu -Isysdeps/x86_64/multiarch -Isysdeps/x86_64 -Isysdeps/x86/include -Isysdeps/x86 -Isysdeps/ieee754/float128 -Isysdeps/ieee754/ldbl-96/include -Isysdeps/ieee754/ldbl-96 -Isysdeps/ieee754/dbl-64 -Isysdeps/ieee754/flt-32 -Isysdeps/wordsize-64 -Isysdeps/ieee754 -Isysdeps/generic -Ilibio -I. -D_LIBC_REENTRANT -include /home/maminjie/work/glibc/tmp-build/libc-modules.h -DMODULE_NAME=libc -include include/libc-symbols.h -DTOP_NAMESPACE=glibc -DASSEMBLER -g -Werror=undef -Wa,--noexecstack -E -x assembler-with-cpp'
/bin/sh sysdeps/unix/make-syscalls.sh $dir || exit 1; };
test $dir = sysdeps/unix && break;
done > /home/maminjie/work/glibc/tmp-build/sysd-syscallsT
mv -f /home/maminjie/work/glibc/tmp-build/sysd-syscallsT /home/maminjie/work/glibc/tmp-build/sysd-syscalls
sysdeps/unix/make-syscalls.sh遍历相关目录下的syscalls.list,最后将内容输出到sysd-syscalls文件中。
sysd-syscalls文件内容如下:
#### DIRECTORY = sysdeps/unix/sysv/linux/x86_64
#### SYSDIRS = sysdeps/unix/sysv/linux/x86_64/64
#### CALL=arch_prctl NUMBER=158 ARGS=i:ii SOURCE=-
ifeq (,$(filter arch_prctl,$(unix-syscalls)))
unix-syscalls += arch_prctl
unix-extra-syscalls += arch_prctl
$(foreach p,$(sysd-rules-targets),$(foreach o,$(object-suffixes),$(objpfx)$(patsubst %,$p,arch_prctl)$o)):
$(..)sysdeps/unix/make-syscalls.sh
$(make-target-directory)
(echo '#define SYSCALL_NAME arch_prctl';
echo '#define SYSCALL_NARGS 2';
echo '#define SYSCALL_ULONG_ARG_1 0';
echo '#define SYSCALL_ULONG_ARG_2 0';
echo '#define SYSCALL_SYMBOL __arch_prctl';
echo '#define SYSCALL_NOERRNO 0';
echo '#define SYSCALL_ERRVAL 0';
echo '#include <syscall-template.S>';
echo 'weak_alias (__arch_prctl, arch_prctl)';
echo 'hidden_weak (arch_prctl)';
) | $(compile-syscall) $(foreach p,$(patsubst %arch_prctl,%,$(basename $(@F))),$($(p)CPPFLAGS))
endif
...
#### DIRECTORY = sysdeps/unix/sysv/linux/wordsize-64
#### SYSDIRS = sysdeps/unix/sysv/linux/x86_64/64 sysdeps/unix/sysv/linux/x86_64 sysdeps/unix/sysv/linux/x86 sysdeps/x86/nptl
#### CALL=sendfile NUMBER=40 ARGS=i:iipi SOURCE=-
ifeq (,$(filter sendfile,$(unix-syscalls)))
unix-syscalls += sendfile
$(foreach p,$(sysd-rules-targets),$(foreach o,$(object-suffixes),$(objpfx)$(patsubst %,$p,sendfile)$o)):
$(..)sysdeps/unix/make-syscalls.sh
$(make-target-directory)
(echo '#define SYSCALL_NAME sendfile';
echo '#define SYSCALL_NARGS 4';
echo '#define SYSCALL_ULONG_ARG_1 0';
echo '#define SYSCALL_ULONG_ARG_2 0';
echo '#define SYSCALL_SYMBOL sendfile';
echo '#define SYSCALL_NOERRNO 0';
echo '#define SYSCALL_ERRVAL 0';
echo '#include <syscall-template.S>';
echo 'weak_alias (sendfile, sendfile64)';
echo 'hidden_weak (sendfile64)';
) | $(compile-syscall) $(foreach p,$(patsubst %sendfile,%,$(basename $(@F))),$($(p)CPPFLAGS))
endif
...
#### DIRECTORY = sysdeps/unix/sysv/linux
#### SYSDIRS = sysdeps/unix/sysv/linux/x86_64/64 sysdeps/unix/sysv/linux/x86_64 sysdeps/unix/sysv/linux/x86 sysdeps/x86/nptl sysdeps/unix/sysv/linux/wordsize-64 sysdeps/x86_64/nptl
#### CALL=alarm NUMBER=37 ARGS=i:i SOURCE=-
ifeq (,$(filter alarm,$(unix-syscalls)))
unix-syscalls += alarm
$(foreach p,$(sysd-rules-targets),$(foreach o,$(object-suffixes),$(objpfx)$(patsubst %,$p,alarm)$o)):
$(..)sysdeps/unix/make-syscalls.sh
$(make-target-directory)
(echo '#define SYSCALL_NAME alarm';
echo '#define SYSCALL_NARGS 1';
echo '#define SYSCALL_ULONG_ARG_1 0';
echo '#define SYSCALL_ULONG_ARG_2 0';
echo '#define SYSCALL_SYMBOL alarm';
echo '#define SYSCALL_NOERRNO 0';
echo '#define SYSCALL_ERRVAL 0';
echo '#include <syscall-template.S>';
) | $(compile-syscall) $(foreach p,$(patsubst %alarm,%,$(basename $(@F))),$($(p)CPPFLAGS))
endif
...
以chdir为例,如下所示:
#### CALL=chdir NUMBER=80 ARGS=i:s SOURCE=-
ifeq (,$(filter chdir,$(unix-syscalls)))
unix-syscalls += chdir
$(foreach p,$(sysd-rules-targets),$(foreach o,$(object-suffixes),$(objpfx)$(patsubst %,$p,chdir)$o)):
$(..)sysdeps/unix/make-syscalls.sh
$(make-target-directory)
(echo '#define SYSCALL_NAME chdir';
echo '#define SYSCALL_NARGS 1';
echo '#define SYSCALL_ULONG_ARG_1 0';
echo '#define SYSCALL_ULONG_ARG_2 0';
echo '#define SYSCALL_SYMBOL __chdir';
echo '#define SYSCALL_NOERRNO 0';
echo '#define SYSCALL_ERRVAL 0';
echo '#include <syscall-template.S>';
echo 'weak_alias (__chdir, chdir)';
echo 'hidden_weak (chdir)';
) | $(compile-syscall) $(foreach p,$(patsubst %chdir,%,$(basename $(@F))),$($(p)CPPFLAGS))
endif
实际上,最后chdir的汇编系统调用代码(assembly syscall wrappers)被解析成如下临时内容,然后进行编译:
#define SYSCALL_NAME chdir
#define SYSCALL_NARGS 1
#define SYSCALL_ULONG_ARG_1 0
#define SYSCALL_ULONG_ARG_2 0
#define SYSCALL_SYMBOL __chdir
#define SYSCALL_NOERRNO 0
#define SYSCALL_ERRVAL 0
#include <syscall-template.S>
weak_alias (__chdir, chdir)
hidden_weak (chdir)
sysdeps/unix/syscalls.list中的chdir系统调用定义如下:
# File name Caller Syscall name Args Strong name Weak names
...
chdir - chdir i:s __chdir chdir
...
每个系统调用的对象都是由 make-syscalls.sh 生成的 sysd-syscalls 中的规则构建的,该规则在定义了几个宏之后 #include <syscall-template.S>:
- SYSCALL_NAME:系统调用名称。可以从Syscall name列获取。
- SYSCALL_NARGS:此调用采用的参数数量。可以通过解析Args列获取。
- SYSCALL_ULONG_ARG_1:此调用采用的第一个无符号长整型参数。
0 表示没有 unsigned long int 参数。可以通过解析Args列获取。 - SYSCALL_ULONG_ARG_2:此调用采用的第二个无符号长整型参数。
0 表示最多有一个 unsigned long int 参数。可以通过解析Args列获取。 - SYSCALL_SYMBOL:主要符号名称。可以从Strong name列获取。
- SYSCALL_NOERRNO:1 定义无错误版本,即没有出错返回。可以通过解析Args列设置。
- SYSCALL_ERRVAL:1 定义错误值版本,直接返回错误号,不是返回-1并将错误号放入errno中。可以通过解析Args列设置。
(说明:上述释义参考sysdeps/unix/syscall-template.S文件中的注释)
weak_alias (__chdir, chdir):定义了__chdir函数的别称,可以调用chdir来调用__chdir。 chdir从Weak names列获取。
3.3. syscall-template.S
#include <sysdep.h>
/* This indirection is needed so that SYMBOL gets macro-expanded. */
#define syscall_hidden_def(SYMBOL) hidden_def (SYMBOL)
/* If PSEUDOS_HAVE_ULONG_INDICES is defined, PSEUDO and T_PSEUDO macros
have 2 extra arguments for unsigned long int arguments:
Extra argument 1: Position of the first unsigned long int argument.
Extra argument 2: Position of the second unsigned long int argument.
*/
#ifndef PSEUDOS_HAVE_ULONG_INDICES
# undef SYSCALL_ULONG_ARG_1
# define SYSCALL_ULONG_ARG_1 0
#endif
#if SYSCALL_ULONG_ARG_1
# define T_PSEUDO(SYMBOL, NAME, N, U1, U2)
PSEUDO (SYMBOL, NAME, N, U1, U2)
# define T_PSEUDO_NOERRNO(SYMBOL, NAME, N, U1, U2)
PSEUDO_NOERRNO (SYMBOL, NAME, N, U1, U2)
# define T_PSEUDO_ERRVAL(SYMBOL, NAME, N, U1, U2)
PSEUDO_ERRVAL (SYMBOL, NAME, N, U1, U2)
#else
# define T_PSEUDO(SYMBOL, NAME, N)
PSEUDO (SYMBOL, NAME, N)
# define T_PSEUDO_NOERRNO(SYMBOL, NAME, N)
PSEUDO_NOERRNO (SYMBOL, NAME, N)
# define T_PSEUDO_ERRVAL(SYMBOL, NAME, N)
PSEUDO_ERRVAL (SYMBOL, NAME, N)
#endif
#define T_PSEUDO_END(SYMBOL) PSEUDO_END (SYMBOL)
#define T_PSEUDO_END_NOERRNO(SYMBOL) PSEUDO_END_NOERRNO (SYMBOL)
#define T_PSEUDO_END_ERRVAL(SYMBOL) PSEUDO_END_ERRVAL (SYMBOL)
#if SYSCALL_NOERRNO
/* This kind of system call stub never returns an error.
We return the return value register to the caller unexamined. */
# if SYSCALL_ULONG_ARG_1
T_PSEUDO_NOERRNO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS,
SYSCALL_ULONG_ARG_1, SYSCALL_ULONG_ARG_2)
# else
T_PSEUDO_NOERRNO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
# endif
ret_NOERRNO
T_PSEUDO_END_NOERRNO (SYSCALL_SYMBOL)
#elif SYSCALL_ERRVAL
/* This kind of system call stub returns the errno code as its return
value, or zero for success. We may massage the kernel's return value
to meet that ABI, but we never set errno here. */
# if SYSCALL_ULONG_ARG_1
T_PSEUDO_ERRVAL (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS,
SYSCALL_ULONG_ARG_1, SYSCALL_ULONG_ARG_2)
# else
T_PSEUDO_ERRVAL (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
# endif
ret_ERRVAL
T_PSEUDO_END_ERRVAL (SYSCALL_SYMBOL)
#else
/* This is a "normal" system call stub: if there is an error,
it returns -1 and sets errno. */
# if SYSCALL_ULONG_ARG_1
T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS,
SYSCALL_ULONG_ARG_1, SYSCALL_ULONG_ARG_2)
# else
T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
# endif
ret
T_PSEUDO_END (SYSCALL_SYMBOL)
#endif
syscall_hidden_def (SYSCALL_SYMBOL)
这里的sysdep.h在x86_64平台上指sysdeps/unix/sysv/linux/x86_64/sysdep.h
。
sysdep.h的包含/调用关系如下所示:
sysdeps/unix/sysv/linux/x86_64/sysdep.h
-> sysdeps/unix/sysv/linux/sysdep.h
-> sysdeps/unix/x86_64/sysdep.h
->sysdeps/unix/sysdep.h
-> sysdeps/generic/sysdep.h
-> sys/syscall.h
-> sysdeps/x86_64/sysdep.h
-> sysdeps/x86/sysdep.h
-> sysdeps/generic/sysdep.h
chdir系统调用的SYSCALL_NOERRNO宏定义为0,SYSCALL_ERRVAL宏定义为0,所以执行:
# if SYSCALL_ULONG_ARG_1
T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS,
SYSCALL_ULONG_ARG_1, SYSCALL_ULONG_ARG_2)
# else
T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
# endif
ret
T_PSEUDO_END (SYSCALL_SYMBOL)
由于SYSCALL_ULONG_ARG_1宏定义为0,所以最终执行T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
,而T_PSEUDO宏又调用了PSEUDO宏,如下:
# define T_PSEUDO(SYMBOL, NAME, N)
PSEUDO (SYMBOL, NAME, N)
3.3.1. PSEUDO
PSEUDO宏定义在 sysdeps/unix/sysv/linux/x86_64/sysdep.h 文件中,定义如下:
#ifdef __ASSEMBLER__
...
# undef PSEUDO
...
# define PSEUDO(name, syscall_name, args)
.text;
ENTRY (name)
DO_CALL (syscall_name, args, 0, 0);
cmpq $-4095, %rax;
jae SYSCALL_ERROR_LABEL
...
只有定义宏__ASSEMBLER__
,PSEUDO宏才有效。
3.3.1.1. ENTRY
ENTRY宏定义在 sysdeps/x86/sysdep.h 文件中,定义如下:
#define ALIGNARG(log2) 1<<log2
...
/* Define an entry point visible from C. */
#define ENTRY(name)
.globl C_SYMBOL_NAME(name);
.type C_SYMBOL_NAME(name),@function;
.align ALIGNARG(4);
C_LABEL(name)
cfi_startproc;
_CET_ENDBR;
CALL_MCOUNT
其它宏散落在相应的其它头文件中,这里收集统一展示如下:
// include/libc-symbols.h
#ifndef C_SYMBOL_NAME
# define C_SYMBOL_NAME(name) name
#endif
// sysdeps/generic/sysdep.h
#ifndef C_LABEL
/* Define a macro we can use to construct the asm name for a C symbol. */
# define C_LABEL(name) name##:
#endif
// sysdeps/generic/sysdep.h
# define cfi_startproc .cfi_startproc
// sysdeps/x86_64/sysdep.h
#define CALL_MCOUNT /* Do nothing. */
ENTRY (name)定义了函数名,并声明该函数名是全局的。
3.3.1.2. DO_CALL
DO_CALL宏定义在 sysdeps/unix/sysv/linux/x86_64/sysdep.h 文件中,定义如下:
/* For Linux we can use the system call table in the header file
/usr/include/asm/unistd.h
of the kernel. But these symbols do not follow the SYS_* syntax
so we have to redefine the `SYS_ify' macro here. */
#undef SYS_ify
#define SYS_ify(syscall_name) __NR_##syscall_name
...
/* The Linux/x86-64 kernel expects the system call parameters in
registers according to the following table:
syscall number rax
arg 1 rdi
arg 2 rsi
arg 3 rdx
arg 4 r10
arg 5 r8
arg 6 r9
The Linux kernel uses and destroys internally these registers:
return address from
syscall rcx
eflags from syscall r11
Normal function call, including calls to the system call stub
functions in the libc, get the first six parameters passed in
registers and the seventh parameter and later on the stack. The
register use is as follows:
system call number in the DO_CALL macro
arg 1 rdi
arg 2 rsi
arg 3 rdx
arg 4 rcx
arg 5 r8
arg 6 r9
We have to take care that the stack is aligned to 16 bytes. When
called the stack is not aligned since the return address has just
been pushed.
Syscalls of more than 6 arguments are not supported. */
# undef DO_CALL
# define DO_CALL(syscall_name, args, ulong_arg_1, ulong_arg_2)
DOARGS_##args
ZERO_EXTEND_##ulong_arg_1
ZERO_EXTEND_##ulong_arg_2
movl $SYS_ify (syscall_name), %eax;
syscall;
# define DOARGS_0 /* nothing */
# define DOARGS_1 /* nothing */
# define DOARGS_2 /* nothing */
# define DOARGS_3 /* nothing */
# define DOARGS_4 movq %rcx, %r10;
# define DOARGS_5 DOARGS_4
# define DOARGS_6 DOARGS_5
# define ZERO_EXTEND_0 /* nothing */
# define ZERO_EXTEND_1 /* nothing */
# define ZERO_EXTEND_2 /* nothing */
# define ZERO_EXTEND_3 /* nothing */
# define ZERO_EXTEND_4 /* nothing */
# define ZERO_EXTEND_5 /* nothing */
# define ZERO_EXTEND_6 /* nothing */
Linux/x86-64内核期望系统调用的第4个参数在寄存器r10中,而实际函数调用的第4个参数在寄存器rcx中,所以需要进行mov操作,即 movq %rcx, %r10。
movl $SYS_ify (syscall_name), %eax 将系统调用号(__NR_syscall_name)放入寄存器eax中。
最后执行 syscall 指令完成系统调用……
3.3.1.3. SYSCALL_ERROR_LABEL
cmpq $-4095, %rax;
jae SYSCALL_ERROR_LABEL
执行系统调用后,系统调用返回值放入eax寄存器中。此处比较eax寄存器值是否大于-4095,如果大于则表示系统调用执行错误,跳转到SYSCALL_ERROR_LABEL标签处。(为什么是-4095?这是linux操作系统的规定)
// sysdeps/unix/sysv/linux/x86_64/sysdep.h
# undef SYSCALL_ERROR_LABEL
# ifdef PIC
# undef SYSCALL_ERROR_LABEL
# define SYSCALL_ERROR_LABEL 0f
# else
# undef SYSCALL_ERROR_LABEL
# define SYSCALL_ERROR_LABEL syscall_error
# endif
// sysdeps/x86/sysdep.h
#define syscall_error __syscall_error
// sysdeps/unix/x86_64/sysdep.S
__syscall_error:
#if defined (EWOULDBLOCK_sys) && EWOULDBLOCK_sys != EAGAIN
/* We translate the system's EWOULDBLOCK error into EAGAIN.
The GNU C library always defines EWOULDBLOCK==EAGAIN.
EWOULDBLOCK_sys is the original number. */
cmp $EWOULDBLOCK_sys, %RAX_LP /* Is it the old EWOULDBLOCK? */
jne notb /* Branch if not. */
movl $EAGAIN, %eax /* Yes; translate it to EAGAIN. */
notb:
#endif
#ifdef PIC
movq C_SYMBOL_NAME(errno@GOTTPOFF)(%rip), %rcx
movl %eax, %fs:0(%rcx)
#else
movl %eax, %fs:C_SYMBOL_NAME(errno@TPOFF)
#endif
or $-1, %RAX_LP
ret
3.3.2. PSEUDO_END
T_PSEUDO_END宏调用了PSEUDO_END宏,PSEUDO_END宏定义在sysdeps/unix/sysv/linux/x86_64/sysdep.h 文件中,如下所示:
# undef PSEUDO_END
# define PSEUDO_END(name)
SYSCALL_ERROR_HANDLER
END (name)
...
...
# ifndef PIC
# define SYSCALL_ERROR_HANDLER /* Nothing here; code in sysdep.S is used. */
# else
# define SYSCALL_ERROR_HANDLER
0:
SYSCALL_SET_ERRNO;
or $-1, %RAX_LP;
ret;
# endif /* PIC */
3.3.2.1. END
END宏定义在 sysdeps/x86/sysdep.h 文件中,定义如下:
#define ASM_SIZE_DIRECTIVE(name) .size name,.-name;
...
#undef END
#define END(name)
cfi_endproc;
ASM_SIZE_DIRECTIVE(name)
其它宏散落在相应的其它头文件中,这里收集统一展示如下:
// sysdeps/generic/sysdep.h
# define cfi_endproc .cfi_endproc
PSEUDO_END结束了整个汇编代码。
以上就是在x86_64平台下对chdir系统调用的分析。其它平台的chdir和其它的系统调用,读者可以自行查看。
4. 宏系统调用详解
通过“包装器->宏系统调用”的介绍,我们知道宏系统调用是由一些*.c文件处理的,本节我们以x86_64平台的系统调用clock_gettime进行讲解。
4.1. clock_gettime
clock_gettime声明在time/time.h文件中,内容如下:
#ifdef __USE_POSIX199309
# ifndef __USE_TIME_BITS64
/* Pause execution for a number of nanoseconds.
This function is a cancellation point and therefore not marked with
__THROW. */
extern int nanosleep (const struct timespec *__requested_time,
struct timespec *__remaining);
/* Get resolution of clock CLOCK_ID. */
extern int clock_getres (clockid_t __clock_id, struct timespec *__res) __THROW;
/* Get current value of clock CLOCK_ID and store it in TP. */
extern int clock_gettime (clockid_t __clock_id, struct timespec *__tp) __THROW;
/* Set clock CLOCK_ID to value TP. */
extern int clock_settime (clockid_t __clock_id, const struct timespec *__tp)
__THROW;
# else
# ifdef __REDIRECT
extern int __REDIRECT (nanosleep, (const struct timespec *__requested_time,
struct timespec *__remaining),
__nanosleep64);
extern int __REDIRECT_NTH (clock_getres, (clockid_t __clock_id,
struct timespec *__res),
__clock_getres64);
extern int __REDIRECT_NTH (clock_gettime, (clockid_t __clock_id, struct
timespec *__tp), __clock_gettime64);
extern int __REDIRECT_NTH (clock_settime, (clockid_t __clock_id, const struct
timespec *__tp), __clock_settime64);
# else
# define nanosleep __nanosleep64
# define clock_getres __clock_getres64
# define clock_gettime __clock_gettime64
# define clock_settime __clock_settime64
# endif
# endif
1)如果定义宏__USE_TIME_BITS64,那么将使用64位的接口,# define clock_gettime __clock_gettime64
很好理解,重定向__REDIRECT宏定义在文件 misc/sys/cdefs.h 中,内容如下:
# define __REDIRECT(name, proto, alias) name proto __asm__ (__ASMNAME (#alias))
定义别名,类似于#define。
2)如果没定义宏__USE_TIME_BITS64,那么clock_gettime的实现在哪里呢?
a)time/clock_gettime.c 文件中
// time/clock_gettime.c
#include <errno.h>
#include <time.h>
#include <shlib-compat.h>
/* Get current value of CLOCK and store it in TP. */
int
__clock_gettime (clockid_t clock_id, struct timespec *tp)
{
__set_errno (ENOSYS);
return -1;
}
libc_hidden_def (__clock_gettime)
versioned_symbol (libc, __clock_gettime, clock_gettime, GLIBC_2_17);
/* clock_gettime moved to libc in version 2.17;
old binaries may expect the symbol version it had in librt. */
#if SHLIB_COMPAT (libc, GLIBC_2_2, GLIBC_2_17)
compat_symbol (libc, __clock_gettime, clock_gettime, GLIBC_2_2);
#endif
stub_warning (clock_gettime)
b)sysdeps/unix/sysv/linux/clock_gettime.c 文件中
// sysdeps/unix/sysv/linux/clock_gettime.c
int
__clock_gettime64 (clockid_t clock_id, struct __timespec64 *tp)
{
int r;
#ifndef __NR_clock_gettime64
# define __NR_clock_gettime64 __NR_clock_gettime
#endif
#ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
int (*vdso_time64) (clockid_t clock_id, struct __timespec64 *tp)
= GLRO(dl_vdso_clock_gettime64);
if (vdso_time64 != NULL)
{
r = INTERNAL_VSYSCALL_CALL (vdso_time64, 2, clock_id, tp);
if (r == 0)
return 0;
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
}
#endif
#ifdef HAVE_CLOCK_GETTIME_VSYSCALL
int (*vdso_time) (clockid_t clock_id, struct timespec *tp)
= GLRO(dl_vdso_clock_gettime);
if (vdso_time != NULL)
{
struct timespec tp32;
r = INTERNAL_VSYSCALL_CALL (vdso_time, 2, clock_id, &tp32);
if (r == 0 && tp32.tv_sec > 0)
{
*tp = valid_timespec_to_timespec64 (tp32);
return 0;
}
else if (r != 0)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
/* Fallback to syscall if the 32-bit time_t vDSO returns overflows. */
}
#endif
r = INTERNAL_SYSCALL_CALL (clock_gettime64, clock_id, tp);
if (r == 0)
return 0;
if (r != -ENOSYS)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
#ifndef __ASSUME_TIME64_SYSCALLS
/* Fallback code that uses 32-bit support. */
struct timespec tp32;
r = INTERNAL_SYSCALL_CALL (clock_gettime, clock_id, &tp32);
if (r == 0)
{
*tp = valid_timespec_to_timespec64 (tp32);
return 0;
}
#endif
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
}
#if __TIMESIZE != 64
libc_hidden_def (__clock_gettime64)
int
__clock_gettime (clockid_t clock_id, struct timespec *tp)
{
int ret;
struct __timespec64 tp64;
ret = __clock_gettime64 (clock_id, &tp64);
if (ret == 0)
{
if (! in_time_t_range (tp64.tv_sec))
{
__set_errno (EOVERFLOW);
return -1;
}
*tp = valid_timespec64_to_timespec (tp64);
}
return ret;
}
#endif
libc_hidden_def (__clock_gettime)
versioned_symbol (libc, __clock_gettime, clock_gettime, GLIBC_2_17);
/* clock_gettime moved to libc in version 2.17;
old binaries may expect the symbol version it had in librt. */
#if SHLIB_COMPAT (libc, GLIBC_2_2, GLIBC_2_17)
strong_alias (__clock_gettime, __clock_gettime_2);
compat_symbol (libc, __clock_gettime_2, clock_gettime, GLIBC_2_2);
#endif
如上所述,有两个clock_gettime.c文件(time/clock_gettime.c 和 sysdeps/unix/sysv/linux/clock_gettime.c)中定义了clock_gettime,都是别名到__clock_gettime,简述如下:
versioned_symbol定义如下:
// include/shlib-compat.h
#ifdef SHARED
...
# define versioned_symbol(lib, local, symbol, version)
versioned_symbol_1 (lib, local, symbol, version)
# define versioned_symbol_1(lib, local, symbol, version)
versioned_symbol_2 (local, symbol, VERSION_##lib##_##version)
# define versioned_symbol_2(local, symbol, name)
default_symbol_version (local, symbol, name)
...
#else
...
# define versioned_symbol(lib, local, symbol, version)
weak_alias (local, symbol)
...
// include/libc-symbols.h
#ifdef SHARED
...
# define default_symbol_version(real, name, version)
_default_symbol_version(real, name, version)
/* See <libc-symver.h>. */
# ifdef __ASSEMBLER__
# define _default_symbol_version(real, name, version)
_set_symbol_version (real, name@@version)
# else
# define _default_symbol_version(real, name, version)
_set_symbol_version (real, #name "@@" #version)
# endif
...
#else /* !SHARED */
...
# define default_symbol_version(real, name, version)
strong_alias(real, name)
#endif
versioned_symbol和compat_symbol实际上都是将 clock_gettime 别名到 __clock_gettime,调用clock_gettime相当于调用__clock_gettime。
1)time/clock_gettime.c中的__clock_gettime是个空函数,没有具体的实现;
2)sysdeps/unix/sysv/linux/clock_gettime.c中的__clock_gettime只有在__TIMESIZE != 64的情况下才有定义,且调用的是 __clock_gettime64。那如果 __TIMESIZE == 64,__clock_gettime定义在哪里呢?
#if __TIMESIZE == 64
# define __clock_nanosleep_time64 __clock_nanosleep
# define __clock_gettime64 __clock_gettime
# define __timespec_get64 __timespec_get
# define __timespec_getres64 __timespec_getres
#else
extern int __clock_nanosleep_time64 (clockid_t clock_id,
int flags, const struct __timespec64 *req,
struct __timespec64 *rem);
libc_hidden_proto (__clock_nanosleep_time64)
extern int __clock_gettime64 (clockid_t clock_id, struct __timespec64 *tp);
libc_hidden_proto (__clock_gettime64)
extern int __timespec_get64 (struct __timespec64 *ts, int base);
libc_hidden_proto (__timespec_get64)
extern int __timespec_getres64 (struct __timespec64 *ts, int base);
libc_hidden_proto (__timespec_getres64)
#endif
如果 __TIMESIZE == 64,宏__clock_gettime64定义为__clock_gettime,即sysdeps/unix/sysv/linux/clock_gettime.c 中的__clock_gettime64将被替换为__clock_gettime,就完成了__clock_gettime的实现,如下:
// sysdeps/unix/sysv/linux/clock_gettime.c
int
__clock_gettime64 (clockid_t clock_id, struct __timespec64 *tp)
{
int r;
#ifndef __NR_clock_gettime64
# define __NR_clock_gettime64 __NR_clock_gettime
#endif
...
}
#if __TIMESIZE != 64
...
int
__clock_gettime (clockid_t clock_id, struct timespec *tp)
{
int ret;
struct __timespec64 tp64;
ret = __clock_gettime64 (clock_id, &tp64);
...
}
#endif
这两个clock_gettime.c文件,我们实际上使用了哪个文件呢?通过代码上,很难直观的看出使用了哪个,可以通过查看编译的产物来确认,如下所示:
maminjie@fedora ~/w/g/tmp-build> find -name clock_gettime.o
./time/clock_gettime.o
maminjie@fedora ~/w/g/tmp-build>
通过.o文件的路径,感觉像是使用了time/clock_gettime.c文件,还需要进一步确认,有几种方法如下:
1)查看编译日志
编译过程中,如果将编译过程日志保存到了文件中,可以查看日志文件来确认。
从日志中可知,使用了sysdeps/unix/sysv/linux/clock_gettime.c文件。
2)readelf查看
2.1)readelf -s xxx查看符号表
maminjie@fedora ~/w/g/tmp-build> readelf -s ./time/clock_gettime.o
Symbol table '.symtab' contains 21 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS clock_gettime.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 8
8: 0000000000000000 0 SECTION LOCAL DEFAULT 9
9: 0000000000000000 0 SECTION LOCAL DEFAULT 11
10: 0000000000000000 0 SECTION LOCAL DEFAULT 12
11: 0000000000000000 0 SECTION LOCAL DEFAULT 14
12: 0000000000000000 0 SECTION LOCAL DEFAULT 15
13: 0000000000000000 0 SECTION LOCAL DEFAULT 17
14: 0000000000000000 0 SECTION LOCAL DEFAULT 18
15: 0000000000000000 0 SECTION LOCAL DEFAULT 16
16: 0000000000000000 123 FUNC GLOBAL HIDDEN 1 __clock_gettime
17: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _dl_vdso_clock_g[...]
18: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
19: 0000000000000000 0 TLS GLOBAL DEFAULT UND __libc_errno
20: 0000000000000000 123 FUNC WEAK DEFAULT 1 clock_gettime
maminjie@fedora ~/w/g/tmp-build>
似乎看不出来使用了哪个文件……
2.2)readelf --debug-dump=info xxx查看debug信息(前提是debug模式编译)
maminjie@fedora ~/w/g/tmp-build> readelf --debug-dump=info ./time/clock_gettime.o | grep -A10 DW_TAG_compile_unit
<0><c>: Abbrev Number: 21 (DW_TAG_compile_unit)
<d> DW_AT_producer : (indirect string, offset: 0x8): GNU C11 11.1.1 20210428 (Red Hat 11.1.1-1) -mtune=generic -march=x86-64 -g -O2 -std=gnu11 -fgnu89-inline -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -fmath-errno -ftls-model=initial-exec
<11> DW_AT_language : 29 (C11)
<12> DW_AT_name : (indirect line string, offset: 0x0): ../sysdeps/unix/sysv/linux/clock_gettime.c
<16> DW_AT_comp_dir : (indirect line string, offset: 0x2b): /mnt/hgfs/projects/linux/glibc/time
<1a> DW_AT_low_pc : 0x0
<22> DW_AT_high_pc : 0x7b
<2a> DW_AT_stmt_list : 0x0
<1><2e>: Abbrev Number: 3 (DW_TAG_base_type)
<2f> DW_AT_byte_size : 1
<30> DW_AT_encoding : 8 (unsigned char)
maminjie@fedora ~/w/g/tmp-build>
从.debug_info段中也可以看出使用了sysdeps/unix/sysv/linux/clock_gettime.c文件。
3)objdump -S xxx 查看反汇编代码(前提是debug模式编译)
maminjie@fedora ~/w/g/tmp-build> objdump -S ./time/clock_gettime.o
./time/clock_gettime.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <__clock_gettime>:
#ifndef __NR_clock_gettime64
# define __NR_clock_gettime64 __NR_clock_gettime
#endif
#ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
int (*vdso_time64) (clockid_t clock_id, struct __timespec64 *tp)
0: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7 <__clock_gettime+0x7>
= GLRO(dl_vdso_clock_gettime64);
if (vdso_time64 != NULL)
7: 48 85 c0 test %rax,%rax
a: 74 14 je 20 <__clock_gettime+0x20>
{
c: 48 83 ec 08 sub $0x8,%rsp
{
r = INTERNAL_VSYSCALL_CALL (vdso_time64, 2, clock_id, tp);
10: ff d0 callq *%rax
if (r == 0)
12: 85 c0 test %eax,%eax
14: 75 52 jne 68 <__clock_gettime+0x68>
return 0;
16: 31 c0 xor %eax,%eax
return 0;
}
#endif
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
}
18: 48 83 c4 08 add $0x8,%rsp
1c: c3 retq
1d: 0f 1f 00 nopl (%rax)
r = INTERNAL_SYSCALL_CALL (clock_gettime64, clock_id, tp);
20: b8 e4 00 00 00 mov $0xe4,%eax
25: 0f 05 syscall
if (r == 0)
27: 85 c0 test %eax,%eax
29: 74 1d je 48 <__clock_gettime+0x48>
if (r != -ENOSYS)
2b: 83 f8 da cmp $0xffffffda,%eax
2e: 74 20 je 50 <__clock_gettime+0x50>
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
30: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 37 <__clock_gettime+0x37>
37: f7 d8 neg %eax
39: 64 89 02 mov %eax,%fs:(%rdx)
3c: b8 ff ff ff ff mov $0xffffffff,%eax
41: c3 retq
42: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
return 0;
48: 31 c0 xor %eax,%eax
}
4a: c3 retq
4b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
50: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 57 <__clock_gettime+0x57>
57: 64 c7 00 26 00 00 00 movl $0x26,%fs:(%rax)
5e: b8 ff ff ff ff mov $0xffffffff,%eax
63: c3 retq
64: 0f 1f 40 00 nopl 0x0(%rax)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
68: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 6f <__clock_gettime+0x6f>
6f: f7 d8 neg %eax
71: 64 89 02 mov %eax,%fs:(%rdx)
74: b8 ff ff ff ff mov $0xffffffff,%eax
79: eb 9d jmp 18 <__clock_gettime+0x18>
从反汇编信息中也可以看出使用了sysdeps/unix/sysv/linux/clock_gettime.c文件,且代码中的__clock_gettime64确实被替换为__clock_gettime了,这里也可以反推出__TIMESIZE == 64。
4.2. __clock_gettime64
通过上面clock_gettime的分析,最终都会使用 sysdeps/unix/sysv/linux/clock_gettime.c 文件中定义的__clock_gettime64,接下来让我们直奔__clock_gettime64,定义如下:
// sysdeps/unix/sysv/linux/clock_gettime.c
int
__clock_gettime64 (clockid_t clock_id, struct __timespec64 *tp)
{
int r;
#ifndef __NR_clock_gettime64
# define __NR_clock_gettime64 __NR_clock_gettime
#endif
#ifdef HAVE_CLOCK_GETTIME64_VSYSCALL
int (*vdso_time64) (clockid_t clock_id, struct __timespec64 *tp)
= GLRO(dl_vdso_clock_gettime64);
if (vdso_time64 != NULL)
{
r = INTERNAL_VSYSCALL_CALL (vdso_time64, 2, clock_id, tp);
if (r == 0)
return 0;
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
}
#endif
#ifdef HAVE_CLOCK_GETTIME_VSYSCALL
int (*vdso_time) (clockid_t clock_id, struct timespec *tp)
= GLRO(dl_vdso_clock_gettime);
if (vdso_time != NULL)
{
struct timespec tp32;
r = INTERNAL_VSYSCALL_CALL (vdso_time, 2, clock_id, &tp32);
if (r == 0 && tp32.tv_sec > 0)
{
*tp = valid_timespec_to_timespec64 (tp32);
return 0;
}
else if (r != 0)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
/* Fallback to syscall if the 32-bit time_t vDSO returns overflows. */
}
#endif
r = INTERNAL_SYSCALL_CALL (clock_gettime64, clock_id, tp);
if (r == 0)
return 0;
if (r != -ENOSYS)
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
#ifndef __ASSUME_TIME64_SYSCALLS
/* Fallback code that uses 32-bit support. */
struct timespec tp32;
r = INTERNAL_SYSCALL_CALL (clock_gettime, clock_id, &tp32);
if (r == 0)
{
*tp = valid_timespec_to_timespec64 (tp32);
return 0;
}
#endif
return INLINE_SYSCALL_ERROR_RETURN_VALUE (-r);
}
这里分了VSYSCALL的32和64位,以及SYSCALL的32和64位,共4中系统调用。接下来,我们重点看看INTERNAL_SYSCALL_CALL (clock_gettime64, clock_id, tp)
,其它方式读者可以自行研究。
4.2.1. INTERNAL_SYSCALL_CALL
宏函数 INTERNAL_SYSCALL_CALL 定义在 sysdeps/unix/sysdep.h 文件中,定义如下:
...
#define __SYSCALL_CONCAT_X(a,b) a##b
#define __SYSCALL_CONCAT(a,b) __SYSCALL_CONCAT_X (a, b)
#define __INTERNAL_SYSCALL0(name)
INTERNAL_SYSCALL (name, 0)
#define __INTERNAL_SYSCALL1(name, a1)
INTERNAL_SYSCALL (name, 1, a1)
#define __INTERNAL_SYSCALL2(name, a1, a2)
INTERNAL_SYSCALL (name, 2, a1, a2)
#define __INTERNAL_SYSCALL3(name, a1, a2, a3)
INTERNAL_SYSCALL (name, 3, a1, a2, a3)
#define __INTERNAL_SYSCALL4(name, a1, a2, a3, a4)
INTERNAL_SYSCALL (name, 4, a1, a2, a3, a4)
#define __INTERNAL_SYSCALL5(name, a1, a2, a3, a4, a5)
INTERNAL_SYSCALL (name, 5, a1, a2, a3, a4, a5)
#define __INTERNAL_SYSCALL6(name, a1, a2, a3, a4, a5, a6)
INTERNAL_SYSCALL (name, 6, a1, a2, a3, a4, a5, a6)
#define __INTERNAL_SYSCALL7(name, a1, a2, a3, a4, a5, a6, a7)
INTERNAL_SYSCALL (name, 7, a1, a2, a3, a4, a5, a6, a7)
#define __INTERNAL_SYSCALL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
#define __INTERNAL_SYSCALL_NARGS(...)
__INTERNAL_SYSCALL_NARGS_X (__VA_ARGS__,7,6,5,4,3,2,1,0,)
#define __INTERNAL_SYSCALL_DISP(b,...)
__SYSCALL_CONCAT (b,__INTERNAL_SYSCALL_NARGS(__VA_ARGS__))(__VA_ARGS__)
/* Issue a syscall defined by syscall number plus any other argument required.
It is similar to INTERNAL_SYSCALL macro, but without the need to pass the
expected argument number as second parameter. */
#define INTERNAL_SYSCALL_CALL(...)
__INTERNAL_SYSCALL_DISP (__INTERNAL_SYSCALL, __VA_ARGS__)
...
下面推导一下 INTERNAL_SYSCALL_CALL 最终的调用形式如下:
#define INTERNAL_SYSCALL_CALL(...)
__INTERNAL_SYSCALL_DISP (__INTERNAL_SYSCALL, __VA_ARGS__)
#define INTERNAL_SYSCALL_CALL(...)
__SYSCALL_CONCAT (__INTERNAL_SYSCALL,__INTERNAL_SYSCALL_NARGS(__VA_ARGS__))(__VA_ARGS__)
#define INTERNAL_SYSCALL_CALL(...)
__SYSCALL_CONCAT (__INTERNAL_SYSCALL,__INTERNAL_SYSCALL_NARGS_X (__VA_ARGS__,7,6,5,4,3,2,1,0,))(__VA_ARGS__)
//
// 因为INTERNAL_SYSCALL_CALL的第一个参数是系统调用名,后面参数是系统调用的参数,
// 所以__VA_ARGS__至少是一个参数。
//
// __INTERNAL_SYSCALL_NARGS_X (__VA_ARGS__,7,6,5,4,3,2,1,0,)根据__VA_ARGS__个数来决定其值,情况如下:
// 系统调用名+0个参数时,其值为0
// 系统调用名+1个参数时,其值为1
// 系统调用名+2个参数时,其值为2
// ...
// 根据上述规律可知,其值就是系统调用函数的参数个数。
#define INTERNAL_SYSCALL_CALL(...)
__SYSCALL_CONCAT (__INTERNAL_SYSCALL, n)(__VA_ARGS__)
#define INTERNAL_SYSCALL_CALL(...)
__INTERNAL_SYSCALLn(__VA_ARGS__) // n系统调用函数的参数个数
#define INTERNAL_SYSCALL_CALL(...)
INTERNAL_SYSCALL(系统调用名, 系统调用参数个数, 系统调用参数)
4.2.2. INTERNAL_SYSCALL
宏函数INTERNAL_SYSCALL定义在文件 sysdeps/unix/sysv/linux/x86_64/sysdep.h 中,定义如下:
#undef SYS_ify
#define SYS_ify(syscall_name) __NR_##syscall_name
#ifdef __ASSEMBLER__
...
#else /* !__ASSEMBLER__ */
/* Registers clobbered by syscall. */
# define REGISTERS_CLOBBERED_BY_SYSCALL "cc", "r11", "cx"
/* NB: This also works when X is an array. For an array X, type of
(X) - (X) is ptrdiff_t, which is signed, since size of ptrdiff_t
== size of pointer, cast is a NOP. */
#define TYPEFY1(X) __typeof__ ((X) - (X))
/* Explicit cast the argument. */
#define ARGIFY(X) ((TYPEFY1 (X)) (X))
/* Create a variable 'name' based on type of variable 'X' to avoid
explicit types. */
#define TYPEFY(X, name) __typeof__ (ARGIFY (X)) name
#undef INTERNAL_SYSCALL
#define INTERNAL_SYSCALL(name, nr, args...)
internal_syscall##nr (SYS_ify (name), args)
...
#undef internal_syscall0
#define internal_syscall0(number, dummy...)
({
unsigned long int resultvar;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
#undef internal_syscall1
#define internal_syscall1(number, arg1)
({
unsigned long int resultvar;
TYPEFY (arg1, __arg1) = ARGIFY (arg1);
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number), "r" (_a1)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
#undef internal_syscall2
#define internal_syscall2(number, arg1, arg2)
({
unsigned long int resultvar;
TYPEFY (arg2, __arg2) = ARGIFY (arg2);
TYPEFY (arg1, __arg1) = ARGIFY (arg1);
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number), "r" (_a1), "r" (_a2)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
#undef internal_syscall3
#define internal_syscall3(number, arg1, arg2, arg3)
({
unsigned long int resultvar;
TYPEFY (arg3, __arg3) = ARGIFY (arg3);
TYPEFY (arg2, __arg2) = ARGIFY (arg2);
TYPEFY (arg1, __arg1) = ARGIFY (arg1);
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
#undef internal_syscall4
#define internal_syscall4(number, arg1, arg2, arg3, arg4)
({
unsigned long int resultvar;
TYPEFY (arg4, __arg4) = ARGIFY (arg4);
TYPEFY (arg3, __arg3) = ARGIFY (arg3);
TYPEFY (arg2, __arg2) = ARGIFY (arg2);
TYPEFY (arg1, __arg1) = ARGIFY (arg1);
register TYPEFY (arg4, _a4) asm ("r10") = __arg4;
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
#undef internal_syscall5
#define internal_syscall5(number, arg1, arg2, arg3, arg4, arg5)
({
unsigned long int resultvar;
TYPEFY (arg5, __arg5) = ARGIFY (arg5);
TYPEFY (arg4, __arg4) = ARGIFY (arg4);
TYPEFY (arg3, __arg3) = ARGIFY (arg3);
TYPEFY (arg2, __arg2) = ARGIFY (arg2);
TYPEFY (arg1, __arg1) = ARGIFY (arg1);
register TYPEFY (arg5, _a5) asm ("r8") = __arg5;
register TYPEFY (arg4, _a4) asm ("r10") = __arg4;
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),
"r" (_a5)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
#undef internal_syscall6
#define internal_syscall6(number, arg1, arg2, arg3, arg4, arg5, arg6)
({
unsigned long int resultvar;
TYPEFY (arg6, __arg6) = ARGIFY (arg6);
TYPEFY (arg5, __arg5) = ARGIFY (arg5);
TYPEFY (arg4, __arg4) = ARGIFY (arg4);
TYPEFY (arg3, __arg3) = ARGIFY (arg3);
TYPEFY (arg2, __arg2) = ARGIFY (arg2);
TYPEFY (arg1, __arg1) = ARGIFY (arg1);
register TYPEFY (arg6, _a6) asm ("r9") = __arg6;
register TYPEFY (arg5, _a5) asm ("r8") = __arg5;
register TYPEFY (arg4, _a4) asm ("r10") = __arg4;
register TYPEFY (arg3, _a3) asm ("rdx") = __arg3;
register TYPEFY (arg2, _a2) asm ("rsi") = __arg2;
register TYPEFY (arg1, _a1) asm ("rdi") = __arg1;
asm volatile (
"syscallnt"
: "=a" (resultvar)
: "0" (number), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4),
"r" (_a5), "r" (_a6)
: "memory", REGISTERS_CLOBBERED_BY_SYSCALL);
(long int) resultvar;
})
...
#endif /* __ASSEMBLER__ */
INTERNAL_SYSCALL(name, nr, args…) 最终替换为 internal_syscall{0,1,2,3,4,5,6}(__NR_name, args) ,其中__NR_name为系统调用号,internal_syscallX内部通过syscall指令完成系统调用。
熟悉的__ASSEMBLER__
,上一章“汇编系统调用详解”中讲到,如果定义该宏,那么系统调用将采用汇编方式。此处,正好说明,如果不定义该宏,那么系统调用就采用宏的方式。
5. 同名c文件使用问题
通过上节我们知道clock_gettime.c有多个文件,如下:
maminjie@fedora /m/h/p/l/glibc (master)> find -name clock_gettime.c
./sysdeps/mach/clock_gettime.c
./sysdeps/unix/sysv/linux/clock_gettime.c
./time/clock_gettime.c
最终我们是通过编译日志或二进制中调试信息确认用的是 sysdeps/unix/sysv/linux/clock_gettime.c。那么,对于其它同名c文件,我们也要每次都通过编译日志来确认吗?会不会有什么规律可循呢?
猜测
glibc中对于系统调用会有很多同名c文件(实现文件),这些文件最终使用的是哪个,作者没有找到直接依据,只是大胆猜测其使用顺序:优先使用特定架构下的,其次是linux下的,再是generic下面的,最后是glibc自实现的(往往是空函数)。
下面结合构建日志和源码举几个例子来论证(x86_64平台下):
1)time.c
maminjie@fedora /m/h/p/l/glibc (master)> find -name time.c
./sysdeps/unix/sysv/linux/powerpc/time.c
./sysdeps/unix/sysv/linux/time.c
./sysdeps/unix/sysv/linux/x86/time.c
./time/time.c
time.c使用的是:sysdeps/unix/sysv/linux/x86/time.c
2)times.c
maminjie@fedora /m/h/p/l/glibc (master)> find -name times.c
./posix/times.c
./sysdeps/mach/hurd/times.c
./sysdeps/unix/sysv/linux/times.c
./sysdeps/unix/sysv/linux/x86_64/x32/times.c
times.c使用的是:sysdeps/unix/sysv/linux/times.c
3)clock.c
maminjie@fedora /m/h/p/l/glibc (master)> find -name clock.c
./sysdeps/mach/hurd/clock.c
./sysdeps/posix/clock.c
./sysdeps/unix/sysv/linux/clock.c
./time/clock.c
clock.c使用的是:sysdeps/unix/sysv/linux/clock.c
4)unwind-resume.c
maminjie@fedora /m/h/p/l/glibc (master)> find -name unwind-resume.c
./sysdeps/arm/unwind-resume.c
./sysdeps/generic/unwind-resume.c
./sysdeps/ia64/unwind-resume.c
unwind-resume.c使用的是:sysdeps/generic/unwind-resume.c
5)wait.c
maminjie@fedora /m/h/p/l/glibc (master)> find -name wait.c
./posix/wait.c
6. 内核中系统调用
glibc中的系统调用最终使用linux内核实现的,那么linux内核中该如何查询具体的系统调用呢?
本节我们还是以x86_64平台的系统调用clock_gettime进行讲解,通过上面知道clock_gettime最终调用的是clock_gettime64,下面看看内核中是如何定义clock_gettime64的。
6.1. syscall.tbl
类似于glibc中的汇编系统调用,linux内核中系统调用也有对应的调用表文件,内容格式如下:
arch/x86/entry/syscalls/syscall_32.tbl
#
# 32-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point> <compat entry point>
#
# The __ia32_sys and __ia32_compat_sys stubs are created on-the-fly for
# sys_*() system calls and compat_sys_*() compat system calls if
# IA32_EMULATION is defined, and expect struct pt_regs *regs as their only
# parameter.
#
# The abi is always "i386" for this file.
#
0 i386 restart_syscall sys_restart_syscall
1 i386 exit sys_exit
2 i386 fork sys_fork
3 i386 read sys_read
4 i386 write sys_write
5 i386 open sys_open compat_sys_open
6 i386 close sys_close
7 i386 waitpid sys_waitpid
8 i386 creat sys_creat
9 i386 link sys_link
10 i386 unlink sys_unlink
11 i386 execve sys_execve compat_sys_execve
...
402 i386 msgctl sys_msgctl compat_sys_msgctl
403 i386 clock_gettime64 sys_clock_gettime
404 i386 clock_settime64 sys_clock_settime
405 i386 clock_adjtime64 sys_clock_adjtime
...
第1列:系统调用编号
第2列:abi类型
第3列:系统调用函数名
第4列:系统调用入口点(最终实现的地方)
第5列:兼容性的系统调用入口点
clock_gettime64对应的系统调用号为403,系统调用入口为sys_clock_gettime。
6.2. syscalls 头文件
linux内核会根据syscall.tbl自动生成对应的头文件。
__SYSCALL_I386内容如下:
第一个__SYSCALL_I386定义用于声明系统调用函数,第二个__SYSCALL_I386定义用于初始化数组。
6.3. SYSCALL_DEFINE 宏
sys_clock_gettime在linux内核中是通过宏SYSCALL_DEFINE2(clock_gettime, …)进行包装,宏SYSCALL_DEFINE2定义如下:
include/linux/syscalls.h
#ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
/*
* It may be useful for an architecture to override the definitions of the
* SYSCALL_DEFINE0() and __SYSCALL_DEFINEx() macros, in particular to use a
* different calling convention for syscalls. To allow for that, the prototypes
* for the sys_*() functions below will *not* be included if
* CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
*/
#include <asm/syscall_wrapper.h>
#endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */
...
#else
#define SYSCALL_METADATA(sname, nb, ...)
static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
{
return 0;
}
#endif
#ifndef SYSCALL_DEFINE0
#define SYSCALL_DEFINE0(sname)
SYSCALL_METADATA(_##sname, 0);
asmlinkage long sys_##sname(void);
ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);
asmlinkage long sys_##sname(void)
#endif /* SYSCALL_DEFINE0 */
#define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE4(name, ...) SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE5(name, ...) SYSCALL_DEFINEx(5, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
#define SYSCALL_DEFINE_MAXARGS 6
#define SYSCALL_DEFINEx(x, sname, ...)
SYSCALL_METADATA(sname, x, __VA_ARGS__)
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#define __PROTECT(...) asmlinkage_protect(__VA_ARGS__)
/*
* The asmlinkage stub is aliased to a function named __se_sys_*() which
* sign-extends 32-bit ints to longs whenever needed. The actual work is
* done within __do_sys_*().
*/
#ifndef __SYSCALL_DEFINEx
#define __SYSCALL_DEFINEx(x, name, ...)
__diag_push();
__diag_ignore(GCC, 8, "-Wattribute-alias",
"Type aliasing is used to sanitize syscall arguments");
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
__attribute__((alias(__stringify(__se_sys##name))));
ALLOW_ERROR_INJECTION(sys##name, ERRNO);
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));
asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))
{
long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));
__MAP(x,__SC_TEST,__VA_ARGS__);
__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__));
return ret;
}
__diag_pop();
static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
#endif /* __SYSCALL_DEFINEx */
...
SYSCALL_DEFINE2屏蔽了clock_gettime名字的前缀信息,如果定义了宏CONFIG_ARCH_HAS_SYSCALL_WRAPPER,那么SYSCALL_DEFINE2将使用asm/syscall_wrapper.h中的__SYSCALL_DEFINEx,否则将使用本文件中的__SYSCALL_DEFINEx,被替换为不带前缀的sys_clock_gettime。
arch/x86/include/asm/syscall_wrapper.h头文件内容如下:
将在sys_*前面增加不同的标志,如__ia32__sys_*前缀,同syscalls.h中定义一致。
这里只是简单的解读,没有做严谨的逻辑推导,但是基本规律应该是这样的。
7. 参考资料
- 你真的知道什么是系统调用吗?
- glibc源码分析之系统调用(一)
- glibc源码分析之系统调用(二)
最后
以上就是眼睛大冰棍为你收集整理的glibc 知:系统调用1. 简介2. 包装器3. 汇编系统调用详解4. 宏系统调用详解5. 同名c文件使用问题6. 内核中系统调用7. 参考资料的全部内容,希望文章能够帮你解决glibc 知:系统调用1. 简介2. 包装器3. 汇编系统调用详解4. 宏系统调用详解5. 同名c文件使用问题6. 内核中系统调用7. 参考资料所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复