systemtap是一个基于linux的性能诊断工具,能够对linux内核函数和linux应用的运行细节进行诊断的工具,最近遇到一个函数调用延迟比较大的问题,针对此问题,如果从庞大的代码中按照行分析,可能花费时间成本较大。我建议可以使用systemtap来定位进程的任意代码的运行耗时。这样就很轻松的定位性能问题了。下面就来介绍如何在arm64下使用systemtap定位程序的函数执行性能问题
apt install systemtap-sdt-dev libdw-dev
这里注意,默认系统的systemtap版本过旧,我们需要从systemtap官网下载,当前最新的是systemtap-5.3。
git clone git://sourceware.org/git/systemtap.git
如果sourceware速度慢,可以换清华源镜像站
wget https://mirrors.tuna.tsinghua.edu.cn/sourceware/systemtap/releases/systemtap-5.3.tar.gz
默认最新的代码是基于6.15-rc的内核,我们实际上内核使用的5.10。所以有一些代码需要稍微适配一下。
arm64内核从5之后都开启了vfs的namespace,如果不做额外修改,在arm64上,如下两个函数会报命名空间问题
kernel_read filp_open
报错信息如下
ERROR: modpost: module stap_ce5dbcb79b603543094c4f68fb16a1a8_943 uses symbol kernel_read from namespace VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver, but does not import it. ERROR: modpost: module stap_ce5dbcb79b603543094c4f68fb16a1a8_943 uses symbol filp_open from namespace VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver, but does not import it.
对于此问题,我们需要为systemtap在runtime的代码上声明一下命名空间,如下
# vim runtime/transport/symbols.c MODULE_IMPORT_NS(VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver);
对于6的内核默认实现timer_delete_sync函数,但是我们还是在5.10的内核,我们使用的是del_timer_sync函数,所以需要针对就内核修改systemtap的代码,位置在runtime/transport/relay_compat.h
。
未修改代码如下
#ifdef STAPCONF_DEL_TIMER_SYNC #define STP_TIMER_DELETE_SYNC(a) del_timer_sync(a) #else #define STP_TIMER_DELETE_SYNC(a) timer_delete_sync(a) #endif
这里宏STAPCONF_DEL_TIMER_SYNC会决定具体的函数实现,我们为了代码修改最小化,如下修改
#ifndef STAPCONF_DEL_TIMER_SYNC #define STP_TIMER_DELETE_SYNC(a) del_timer_sync(a) #else #define STP_TIMER_DELETE_SYNC(a) timer_delete_sync(a) #endif
这里简单的修改了宏定义的作用范围
编译方法之前提过,如下
./configure make all -j8 make install -j8
systemtap代码量不算很大,可以之间在机器里面编译。这样make install就直接安装成功了
如果安装成功,那么我们可以看到如下信息
# stap --version Systemtap translator/driver (version 5.3/0.176, non-git sources) Copyright (C) 2005-2025 Red Hat, Inc. and others This is free software; see the source for copying conditions. tested kernel versions: 3.10 ... 6.15-rc enabled features: BPF LIBSQLITE3 LIBXML2 NLS JSON_C
我们需要使用systemtap,那么内核需要打开调试功能,整理如下
CONFIG_DEBUG_INFO=y CONFIG_KPROBES=y CONFIG_UPROBES=y CONFIG_RELAY=y CONFIG_DEBUG_FS=y CONFIG_MODULES=y CONFIG_TRACEPOINTS=y CONFIG_FUNCTION_TRACER=y
systemtap的原理是通过在系统中安插一个ko,通过此ko获取系统的详细信息,所以我们需要在内核中预置头文件。
手动预置的办法如下
make headers_install INSTALL_HDR_PATH=/tmp/kernel-header/ make firmware_install INSTALL_MOD_PATH=/tmp/kernel-header/ make modules_install INSTALL_MOD_PATH=/tmp/kernel-header/ cp --parents `find -type f -name "Makefile*" -o -name "Kconfig*"` /tmp/kernel-header/ cp Module.symvers /tmp/kernel-header/ cp System.map /tmp/kernel-header/ cp -rf scripts/ /tmp/kernel-header/ # arm bin cp -rf include/ /tmp/kernel-header/ cp -rf --parents arch/arm64/include /tmp/kernel-header cp -rf --parents arch/arm/include /tmp/kernel-header cp .config /tmp/kernel-header/ tar cvzf /tmp/kernel-header.tar.gz /tmp/kernel-header
此时我们将/tmp/kernel-header.tar.gz
解压为目录/usr/src/linux-headers-$(uname -r)
然后我们建立头文件链接如下
mkdir /lib/modules/${uname -r}/ ln -sf /usr/src/linux-headers-$(uname -r) build
此时,我们可以在机器中编译ko文件了
如果觉得上述手动预置不方便,那么可以自己从内核打包headers的deb,如下
make bindeb-pkg -j256
注意,上面是在已经构建过内核的情况下,这里只打包。如果没构建过内核,建议从头开始
make deb-pkg -j256
此时我们获得如下文件安装
dpkg -i linux-headers-5.10.198_5.10.198-69_arm64.deb linux-image-5.10.198_5.10.198-69_arm64.deb linux-image-5.10.198-dbg_5.10.198-69_arm64.deb linux-libc-dev_5.10.198-69_arm64.deb
为了能够获取应用程序的符号用于调试,我们需要安装对应应用程序的符号包,如下
# dpkg -i kylin-nm-dbgsym_3.20.1.7_arm64.ddeb
此时我们stap可以获取函数的符号如下
# stap -l 'process("/usr/bin/kylin-nm").function("*")' process("/usr/bin/kylin-nm").function("onShowControlCenter@frontend/tab-pages/lanpage.cpp:723") process("/usr/bin/kylin-nm").function("onSwithGsettingsChanged@frontend/tab-pages/lanpage.cpp:173") process("/usr/bin/kylin-nm").function("onUpdateConnection@frontend/tab-pages/lanpage.cpp:1150") process("/usr/bin/kylin-nm").function("onWiredEnabledChanged@frontend/tab-pages/lanpage.cpp:1233") ......
至此,我们可以开始调试了。
内核头文件完成之后,我们可以编写stap文件来进行调试。对于当前的需求是
所以代码如下:
# cat kylin.stp global entry_times probe process("/usr/bin/kylin-nm").function("LanPage::onWiredEnabledChanged") { entry_time = gettimeofday_us() entry_times[pid()] = entry_time } probe process("/usr/bin/kylin-nm").function("LanPage::onWiredEnabledChanged").return { if (entry_times[pid()] != 0) { exit_time = gettimeofday_us() elapsed = exit_time - entry_times[pid()] printf("[PID %d] [%s]: Took %ld us \n", pid(), "LanPage::onWiredEnabledChanged", elapsed) delete entry_times[pid()] } }
根据上面内容我们可以知道,这里我想定位/usr/bin/kylin-nm
的LanPage::onWiredEnabledChanged
函数的耗时。
我们如下方式运行
# stap -v ./kylin.stp Pass 1: parsed user script and 467 library scripts using 590948virt/103640res/5892shr/130664data kb, in 530usr/200sys/247real ms. Pass 2: analyzed script: 2 probes, 3 functions, 1 embed, 1 global using 598732virt/112916res/7120shr/138448data kb, in 330usr/10sys/339real ms. Pass 3: translated to C into "/tmp/stapbjEzxz/stap_44d2d295e641497a8b3e84b98ae25516_2499_src.c" using 598732virt/113108res/7312shr/138448data kb, in 10usr/240sys/252real ms. Pass 4: compiled C into "stap_44d2d295e641497a8b3e84b98ae25516_2499.ko" in 43320usr/8190sys/12830real ms. Pass 5: starting run.
可以看到有[PID 89425] [LanPage::onWiredEnabledChanged]: Took 174 us
的日志,这里可以看到按钮的响应时间是174us。
我们再点击关闭网络,如下
再点击打开网络,如下
反复10次,此时日志如下
[PID 89425] [LanPage::onWiredEnabledChanged]: Took 279 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 502 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 632 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 617 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 616 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 656 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 146 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 280 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 640 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 576 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 501 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 642 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 647 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 670 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 289 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 603 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 619 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 453 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 594 us [PID 89425] [LanPage::onWiredEnabledChanged]: Took 276 us
至此,我们可以监控任意的函数的执行时间。
如果需要定位其他的程序,就打上其他程序符号,找到需要定位的函数,修改kylin.stp即可
本文演示了在arm64上使用systemtap定位任意函数的耗时问题。systemtap还可以定位内核和其他问题。这里就不额外解释了。有兴趣可以自己研究,参与开源。
值得注意的是,如果systemtap在你的内核环境上运行不起来,你可能需要根据其分析兼容性问题。本文为了让systemtap-5.3在linux 5.10上运行,修改了两处代码的兼容问题。实际需要修复的内容通常是版本演变带来的问题,不会太困难。