1 简介
GNU profiler(gprof)是GNU profiler工具。它可以为Linux平台上的程序精确分析性能瓶颈,它能够记录每个函数的调用次数,每个函数消耗的处理器时间,还能够显示“调用图”,包括函数的调用关系,能够为我们改进应用程序的性能提供很多有利的帮助。
2 原理
通过在编译和链接程序的时候使用-pg选项(编译和链接过程都需要),当我们使用"-pg"选项编译程序后,gcc会做三个工作:
(1) 程序的入口处(main函数之前)插入monstartup函数的调用代码,完成profile的初始化工作,包括分配保存信息的内存以及设置一个clock信号处理函数;
(2) 在每个函数的入口处插入_mcount函数的调用代码,用于统计函数的调用信息:包括调用时间、调用次数以及调用栈信息;
(3) 在程序退出处(注册 atexit()函数),插入_mcleanup()函数的调用代码,负责将profile信息输出到gmon.out中。
3 使用流程
(1) 在编译和链接时,加上-pg选项。
(2) 执行编译的二进制程序
(3) 程序正常退出后,在运行目录下 生成gmon.out文件。如果原来有gmon.out 文件,将会被覆盖。
(4) 用gprof工具分析gmon.out文件。
4 gprof输出分析
在gmon.out文件产生之后,可以通过GNU binutils中提供的工具gprof来分析数据,转换成容易阅读、理解的格式。
一般用法:
# gprof Binary-file gmon.out >report.txt
其中,Binary-file指的是所运行的程序(也可以是程序调用到的库文件),gmon.out就是前面所输出的那个文件,report.txt就是生成的分析报告了。Gprof提供了丰富的参数选项,以控制报告输出的内容。
4.1 简单列表
用文本编辑器打开报告文件:
报告的第一部分是一个简单列表,列出了各个函数的调用情况,如上图所示。列表首先按时间降序排列,如果时间相同,再按调用次数降序排列。各个字段的含义如下:
%time该函数消耗时间占程序所有时间的百分比
cumlative seconds累积执行时间。执行这个函数所消耗的时间,加上其上列函数消耗的时间总和
Self seconds函数自身消耗的时间(所有调用时间总和),列表首先以这个值的大小排序
Calls 函数被调用的次数,如果某个函数从未被调用,那么这个字段为空
Self Ts/call函数自身的平均执行时间
Total Ts/call函数及其衍生函数调用的平均时间
Name 函数名
其实,在列表的下方,给出了这些字段的详细说明:
% the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed. |
4.2 调用关系图
报告中的第二部分是个调用图,它给出了函数及其后代的时间消耗情况。列表按时间消耗降序排列,并且索引化组织,根据索引,很容易找出调用的整体关系。调用关系图之后,给出了图中各元素的说明,看起来很方便:
This table describes the call tree of the program, and was sorted by the total amount of time spent in each function and its children. Each entry in this table consists of several lines. The line with the index number at the left hand margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called. This line lists: index A unique number given to each element of the table. Index numbers are sorted numerically. The index number is printed next to every function name so it is easier to look up where the function in the table. % time This is the percentage of the `total' time that was spent in this function and its children. Note that due to different viewpoints, functions excluded by options, etc, these numbers will NOT add up to 100%. self This is the total amount of time spent in this function. children This is the total amount of time propagated into this function by its children. called This is the number of times the function was called. If the function called itself recursively, the number only includes non-recursive calls, and is followed by a `+' and the number of recursive calls. name The name of the current function. The index number is printed after it. If the function is a member of a cycle, the cycle number is printed between the function's name and the index number. For the function's parents, the fields have the following meanings: self This is the amount of time that was propagated directly from the function into this parent. children This is the amount of time that was propagated from the function's children into this parent. called This is the number of times this parent called the function `/' the total number of times the function was called. Recursive calls to the function are not included in the number after the `/'. name This is the name of the parent. The parent's index number is printed after it. If the parent is a member of a cycle, the cycle number is printed between the name and the index number. If the parents of the function cannot be determined, the word `<spontaneous>' is printed in the `name' field, and all the other fields are blank. For the function's children, the fields have the following meanings: self This is the amount of time that was propagated directly from the child into the function. children This is the amount of time that was propagated from the child's children to the function. called This is the number of times the function called this child `/' the total number of times the child was called. Recursive calls by the child are not listed in the number after the `/'. name This is the name of the child. The child's index number is printed after it. If the child is a member of a cycle, the cycle number is printed between the name and the index number. If there are any cycles (circles) in the call graph, there is an entry for the cycle-as-a-whole. This entry shows who called the cycle (as parents) and the members of the cycle (as children.) The `+' recursive calls entry shows the number of function calls that were internal to the cycle, and the calls entry for each member shows, for that member, how many times it was called from other members of the cycle. |
5 利用DOT图形化
TXT格式的报告,对于小规模的程序已经足够了,但是对于大规模的程序来说,就显得还是太繁杂了,特别是我们把注意力放在调用关系上时,文本的跳跃总是让人不舒服。
把TXT报告转换成图片,需要python和dot,还要下载gprof2dot.py的脚本。
Dot是graphviz提供的一个工具,在CentOS下,可以执行下面命令安装:
#yum install graphviz
安装之后,执行:
# python gprof2dot.py report.txt | dot -Tpng -o ast.png
其中report.txt就是前面gprof输出的文本报告,这时,当前目录下就生成一个名为ast.png的文件了,打开看看。
6 问题
6.1 共享库支持
对于代码剖析的支持是由编译器增加的,因此如果希望从共享库中获得剖析信息,就需要使用-pg来编译这些库。
如果需要分析系统函数(如libc库),需要用–lc_p替换-lc。这样程序会链接libc_p.so或libc_p.a。只有这样才能监控到底层的C库函数的执行时间。
6.2 用户时间与内核时间
它只能分析应用程序在运行过程中所消耗掉的用户时间,无法得到程序内核空间的运行时间。对内核态的调用分析无能为力。如果程序系统调用比率比较大,就不适合。
此外,时间是通过采样分析得到的,结果精度不高,如果执行时间很少,那么可能采不到样,输出时,结果就忽略了,这也是很多地方看到的时间都是0.00的原因。
6.3多线程
Gprof对多线程支持不好,因为gprof用ITIMER_PROF信号,而只有主线程才能处理这个信号。http://sam.zoy.org/writings/programming/gprof.html给了一个解决方法,就是嵌入个钩子,但我用它测试asterisk的时候,效果并不好,子线程的分析结果总是不对。
6.4 其它
只有进程退出才能生成gmon.out文件,用起来还是有些不方便。
发表评论 取消回复