新聞中心
這是Linux命令拾遺系列的第五篇,本篇主要介紹Linux中常用的線程與內(nèi)存剖析工具,以及更高級的perf性能分析工具等。

像vmstat,top等資源觀測命令,只能觀測到資源整體使用情況,以及在各進(jìn)程/線程上的使用情況,但沒法了解到進(jìn)程/線程為什么會使用這么多資源?
這種情況下,可以通過剖析(profile)工具做進(jìn)一步分析,如jstack,jmap,pstack,gcore,perf等。
jstack
jstack是JVM下的線程剖析工具,可以打印出當(dāng)前時刻各線程的調(diào)用棧,這樣就可以了解到j(luò)vm中各線程都在干什么了,如下:
$ jstack 3051
2021-11-07 21:55:06
Full thread dump OpenJDK 64-Bit Server VM (11.0.12+7 mixed mode):
Threads class SMR info:
_java_thread_list=0x00007f3380001f00, length=10, elements={
0x00007f33cc027800, 0x00007f33cc22f800, 0x00007f33cc233800, 0x00007f33cc24b800,
0x00007f33cc24d800, 0x00007f33cc24f800, 0x00007f33cc251800, 0x00007f33cc253800,
0x00007f33cc303000, 0x00007f3380001000
}
"main" 1 prio=5 os_prio=0 cpu=2188.06ms elapsed=1240974.04s tid=0x00007f33cc027800 nid=0xbec runnable [0x00007f33d1b68000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(java.base@11.0.12/Native Method)
at java.io.FileInputStream.read(java.base@11.0.12/FileInputStream.java:279)
at java.io.BufferedInputStream.read1(java.base@11.0.12/BufferedInputStream.java:290)
at java.io.BufferedInputStream.read(java.base@11.0.12/BufferedInputStream.java:351)
- locked <0x00000007423a5ba8> (a java.io.BufferedInputStream)
at sun.nio.cs.StreamDecoder.readBytes(java.base@11.0.12/StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(java.base@11.0.12/StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(java.base@11.0.12/StreamDecoder.java:178)
- locked <0x0000000745ad1cf0> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(java.base@11.0.12/InputStreamReader.java:181)
at java.io.Reader.read(java.base@11.0.12/Reader.java:189)
at java.util.Scanner.readInput(java.base@11.0.12/Scanner.java:882)
at java.util.Scanner.findWithinHorizon(java.base@11.0.12/Scanner.java:1796)
at java.util.Scanner.nextLine(java.base@11.0.12/Scanner.java:1649)
at com.example.clientserver.ClientServerApplication.getDemo(ClientServerApplication.java:57)
at com.example.clientserver.ClientServerApplication.run(ClientServerApplication.java:40)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:804)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:788)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:333)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1309)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1298)
at com.example.clientserver.ClientServerApplication.main(ClientServerApplication.java:27)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.12/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.12/NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.12/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@11.0.12/Method.java:566)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:107)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88)
"Reference Handler" 2 daemon prio=10 os_prio=0 cpu=2.76ms elapsed=1240973.97s tid=0x00007f33cc22f800 nid=0xbf3 waiting on condition [0x00007f33a820a000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.12/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@11.0.12/Reference.java:241)
at java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.12/Reference.java:213)
實例:找占用CPU較高問題代碼
如果你發(fā)現(xiàn)你的java進(jìn)程CPU占用一直都很高,可以用如下方法找到問題代碼:
1,找出占用cpu的線程號pid
# -H表示看線程,其中312是java進(jìn)程的pid
$ top -H -p 312
2,轉(zhuǎn)換線程號為16進(jìn)制
# 其中62是從top中獲取的高cpu的線程pid
$ printf "%x" 314
13a
3,獲取進(jìn)程中所有線程棧,提取對應(yīng)高cpu線程棧
$ jstack 312 | awk -v RS= '/0x13a/'
通過這種方法,可以很容易找到類似大循環(huán)或死循環(huán)之類性能極差的代碼。
實例:查看各線程數(shù)量
$ jstack 10235|grep -oP '"[^"]+"'|sed -E 's/[0-9]+/n/g'|sort|uniq -c|sort -nr
8 "GC Thread#n"
2 "Gn Conc#n"
2 "Cn CompilerThreadn"
1 "main"
1 "VM Thread"
1 "VM Periodic Task Thread"
1 "Sweeper thread"
1 "Signal Dispatcher"
1 "Service Thread"
1 "Reference Handler"
1 "Gn Young RemSet Sampling"
1 "Gn Refine#n"
1 "Gn Main Marker"
1 "Finalizer"
1 "Common-Cleaner"
1 "Attach Listener"
如上,通過uniq等統(tǒng)計jvm各線程數(shù)量,可用于快速確認(rèn)業(yè)務(wù)線程池中線程數(shù)量是否正常。
jmap
jmap是JVM下的內(nèi)存剖析工具,用來分析或?qū)С鯦VM堆數(shù)據(jù),如下:
# 查看對象分布直方圖,其中3051是java進(jìn)程的pid
$ jmap -histo:live 3051 | head -n20
num instances bytes class name (module)
-------------------------------------------------------
1: 19462 1184576 [B (java.base@11.0.12)
2: 3955 469920 java.lang.Class (java.base@11.0.12)
3: 18032 432768 java.lang.String (java.base@11.0.12)
4: 11672 373504 java.util.concurrent.ConcurrentHashMap$Node (java.base@11.0.12)
5: 3131 198592 [Ljava.lang.Object; (java.base@11.0.12)
6: 5708 182656 java.util.HashMap$Node (java.base@11.0.12)
7: 1606 155728 [I (java.base@11.0.12)
8: 160 133376 [Ljava.util.concurrent.ConcurrentHashMap$Node; (java.base@11.0.12)
9: 1041 106328 [Ljava.util.HashMap$Node; (java.base@11.0.12)
10: 6216 99456 java.lang.Object (java.base@11.0.12)
11: 1477 70896 sun.util.locale.LocaleObjectCache$CacheEntry (java.base@11.0.12)
12: 1403 56120 java.util.LinkedHashMap$Entry (java.base@11.0.12)
13: 1322 52880 java.lang.ref.SoftReference (java.base@11.0.12)
14: 583 51304 java.lang.reflect.Method (java.base@11.0.12)
15: 999 47952 java.lang.invoke.MemberName (java.base@11.0.12)
16: 29 42624 [C (java.base@11.0.12)
17: 743 41608 java.util.LinkedHashMap (java.base@11.0.12)
18: 877 35080 java.lang.invoke.MethodType (java.base@11.0.12)
也可以直接將整個堆內(nèi)存轉(zhuǎn)儲為文件,如下:
$ jmap -dump:format=b,file=heap.hprof 3051
Heap dump file created
$ ls *.hprof
heap.hprof
堆轉(zhuǎn)儲文件是二進(jìn)制文件,沒法直接查看,一般是配合mat(Memory Analysis Tool)等堆可視化工具來進(jìn)行分析,如下:
mat打開hprof文件后,會看下如下一個概要界面。
點擊Histogram可以按類維度查詢內(nèi)存占用大小
點擊Dominator Tree可以看到各對象總大小(Retained Heap,包含引用的子對象),以及所占內(nèi)存比例,可以看到一個ArrayList對象占用99.31%,看來是個bug,找到創(chuàng)建ArrayList的代碼,修改即可。
可以看到,使用jmap+mat很容易分析內(nèi)存泄露bug,但需要注意的是,jmap執(zhí)行時會讓jvm暫停,對于高并發(fā)的服務(wù),最好先將問題節(jié)點摘除流量后再操作。
另外,可以在jvm上添加參數(shù)-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./dump/,使得在發(fā)生內(nèi)存溢出時,自動生成堆轉(zhuǎn)儲文件到dump目錄。
mat下載地址:http://www.eclipse.org/mat/
pstack
pstack是c/c++等原生程序的線程剖析工具,類似jstack,不過是面向原生程序的,用法如下:
$ pstack $pid
例如,mysql是用c/c++寫的,當(dāng)mysql運行hang住時,可以通過pstack來查看線程棧。
如下,執(zhí)行一個update SQL卡住了,我們來分析分析為啥?
1,使用processlist找出問題SQL的線程id
mysql> select t.thread_id,t.thread_os_id,pl.* from information_schema.processlist pl
join performance_schema.threads t on pl.id = t.processlist_id;
+-----------+--------------+----+------+---------------------+------+---------+------+-----------+---------------------------------------------------------------------------------------------------------->
| thread_id | thread_os_id | ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO >
+-----------+--------------+----+------+---------------------+------+---------+------+-----------+---------------------------------------------------------------------------------------------------------->
| 32 | 4850 | 7 | root | 10.142.72.126:34934 | NULL | Query | 0 | executing | select t.thread_id,t.thread_os_id,pl.* from information_schema.processlist pl join performance_schema.thr>
| 33 | 5771 | 8 | root | 10.142.112.35:54795 | NULL | Sleep | 488 | | NULL >
| 34 | 4849 | 9 | root | 10.142.112.35:54796 | demo | Sleep | 442 | | NULL >
| 35 | 5185 | 10 | root | 10.142.112.35:54797 | demo | Query | 403 | updating | update order set status=1 where oid=1 >
| 37 | 28904 | 12 | root | 10.142.72.126:34936 | demo | Sleep | 411 | | NULL >
+-----------+--------------+----+------+---------------------+------+---------+------+-----------+---------------------------------------------------------------------------------------------------------->
5 rows in set (0.00 sec)
可以看到,處理update SQL的線程的thread_os_id是5185
2,使用pstack獲取線程棧
# 其中3287是mysqld進(jìn)程的pid
$ pstack 3287 > mysqld_stack.log
3,在線程棧中找到卡住線程
$ cat mysqld_stack.log | awk -v RS= '$1 ~ /5185/'
Thread 33 (Thread 0x7fe3c05ea700 (LWP 5185)):
0 futex_wait_cancelable (private=, expected=0, futex_word=0x7fe358025474) at ../sysdeps/nptl/futex-internal.h:183
1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fe358025420, cond=0x7fe358025448) at pthread_cond_wait.c:508
2 __pthread_cond_wait (cond=0x7fe358025448, mutex=0x7fe358025420) at pthread_cond_wait.c:638
3 0x000055c1b85b55f8 in os_event::wait (this=0x7fe358025408) at /home/work/softwares/mysql-5.7.30/storage/innobase/os/os0event.cc:179
4 0x000055c1b85b4a24 in os_event::wait_low (this=0x7fe358025408, reset_sig_count=6) at /home/work/softwares/mysql-5.7.30/storage/innobase/os/os0event.cc:366
5 0x000055c1b85b51bc in os_event_wait_low (event=0x7fe358025408, reset_sig_count=0) at /home/work/softwares/mysql-5.7.30/storage/innobase/os/os0event.cc:611
6 0x000055c1b8579097 in lock_wait_suspend_thread (thr=0x7fe35800ab60) at /home/work/softwares/mysql-5.7.30/storage/innobase/lock/lock0wait.cc:315
7 0x000055c1b863f860 in row_mysql_handle_errors (new_err=0x7fe3c05e6cf4, trx=0x7fe41c3c18d0, thr=0x7fe35800ab60, savept=0x0) at /home/work/softwares/mysql-5.7.30/storage/innobase/row/row0mysql.cc:783
8 0x000055c1b868514b in row_search_mvcc (buf=0x7fe35801eab0 "\377", mode=PAGE_CUR_GE, prebuilt=0x7fe35800a3a0, match_mode=1, direction=0) at /home/work/softwares/mysql-5.7.30/storage/innobase/row/row0sel.cc:6170
9 0x000055c1b84d83e1 in ha_innobase::index_read (this=0x7fe35801e7c0, buf=0x7fe35801eab0 "\377", key_ptr=0x7fe35800c400 "\001", key_len=8, find_flag=HA_READ_KEY_EXACT) at /home/work/softwares/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:8755
10 0x000055c1b7a7c4fc in handler::index_read_map (this=0x7fe35801e7c0, buf=0x7fe35801eab0 "\377", key=0x7fe35800c400 "\001", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /home/work/softwares/mysql-5.7.30/sql/handler.h:2818
11 0x000055c1b7a6ca36 in handler::ha_index_read_map (this=0x7fe35801e7c0, buf=0x7fe35801eab0 "\377", key=0x7fe35800c400 "\001", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /home/work/softwares/mysql-5.7.30/sql/handler.cc:3046
12 0x000055c1b7a7774a in handler::read_range_first (this=0x7fe35801e7c0, start_key=0x7fe35801e8a8, end_key=0x7fe35801e8c8, eq_range_arg=true, sorted=true) at /home/work/softwares/mysql-5.7.30/sql/handler.cc:7411
13 0x000055c1b7a7556e in handler::multi_range_read_next (this=0x7fe35801e7c0, range_info=0x7fe3c05e7af8) at /home/work/softwares/mysql-5.7.30/sql/handler.cc:6476
14 0x000055c1b7a764c7 in DsMrr_impl::dsmrr_next (this=0x7fe35801ea20, range_info=0x7fe3c05e7af8) at /home/work/softwares/mysql-5.7.30/sql/handler.cc:6868
15 0x000055c1b84ebb84 in ha_innobase::multi_range_read_next (this=0x7fe35801e7c0, range_info=0x7fe3c05e7af8) at /home/work/softwares/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:20574
16 0x000055c1b830debe in QUICK_RANGE_SELECT::get_next (this=0x7fe358026ed0) at /home/work/softwares/mysql-5.7.30/sql/opt_range.cc:11247
17 0x000055c1b801a01c in rr_quick (info=0x7fe3c05e7d90) at /home/work/softwares/mysql-5.7.30/sql/records.cc:405
18 0x000055c1b81cbb1d in mysql_update (thd=0x7fe358000e10, fields=..., values=..., limit=18446744073709551615, handle_duplicates=DUP_ERROR, found_return=0x7fe3c05e8338, updated_return=0x7fe3c05e8340) at /home/work/softwares/mysql-5.7.30/sql/sql_update.cc:819
19 0x000055c1b81d262d in Sql_cmd_update::try_single_table_update (this=0x7fe3580090e0, thd=0x7fe358000e10, switch_to_multitable=0x7fe3c05e83d7) at /home/work/softwares/mysql-5.7.30/sql/sql_update.cc:2906
20 0x000055c1b81d2b9a in Sql_cmd_update::execute (this=0x7fe3580090e0, thd=0x7fe358000e10) at /home/work/softwares/mysql-5.7.30/sql/sql_update.cc:3037
21 0x000055c1b8108f4e in mysql_execute_command (thd=0x7fe358000e10, first_level=true) at /home/work/softwares/mysql-5.7.30/sql/sql_parse.cc:3616
22 0x000055c1b810ed9b in mysql_parse (thd=0x7fe358000e10, parser_state=0x7fe3c05e9530) at /home/work/softwares/mysql-5.7.30/sql/sql_parse.cc:5584
23 0x000055c1b8103afe in dispatch_command (thd=0x7fe358000e10, com_data=0x7fe3c05e9de0, command=COM_QUERY) at /home/work/softwares/mysql-5.7.30/sql/sql_parse.cc:1491
24 0x000055c1b81028b2 in do_command (thd=0x7fe358000e10) at /home/work/softwares/mysql-5.7.30/sql/sql_parse.cc:1032
25 0x000055c1b824a4bc in handle_connection (arg=0x55c1bdca11f0) at /home/work/softwares/mysql-5.7.30/sql/conn_handler/connection_handler_per_thread.cc:313
26 0x000055c1b8955119 in pfs_spawn_thread (arg=0x55c1bdb3fc20) at /home/work/softwares/mysql-5.7.30/storage/perfschema/pfs.cc:2197
27 0x00007fe4253f8609 in start_thread (arg=) at pthread_create.c:477
28 0x00007fe424fd3103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
從調(diào)用棧不難了解到,ha_innobase::index_read應(yīng)該是讀取索引數(shù)據(jù)的函數(shù),調(diào)用row_search_mvcc時,走到一個lock_wait_suspend_thread函數(shù)中去了,看起來SQL執(zhí)行過程中好像由于等待什么鎖,被阻塞住了。
接下來,通過如下語句獲取一下鎖等待信息:
mysql> SELECT r.trx_mysql_thread_id waiting_thread,r.trx_query waiting_query,
-> concat(timestampdiff(SECOND,r.trx_wait_started,CURRENT_TIMESTAMP()),'s') AS duration,
-> b.trx_mysql_thread_id blocking_thread,t.processlist_command state,b.trx_query blocking_current_query,e.sql_text blocking_last_query
-> FROM information_schema.innodb_lock_waits w
-> JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
-> JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id
-> JOIN performance_schema.threads t on t.processlist_id = b.trx_mysql_thread_id
-> JOIN performance_schema.events_statements_current e USING(thread_id);
+----------------+------------------------------------------+----------+-----------------+-------+------------------------+----------------------------------------------------+
| waiting_thread | waiting_query | duration | blocking_thread | state | blocking_current_query | blocking_last_query |
+----------------+------------------------------------------+----------+-----------------+-------+------------------------+----------------------------------------------------+
| 10 | update order set status=1 where oid=1 | 48s | 12 | Sleep | NULL | select * from order where oid=1 for update |
+----------------+------------------------------------------+----------+-----------------+-------+------------------------+----------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
喔,原來有個for update查詢語句,對這條數(shù)據(jù)加了鎖,導(dǎo)致這個update SQL阻塞了。
注:ubuntu中pstack包年久失修,基本無法使用,想在ubuntu中使用,可以將centos中的pstack命令復(fù)制過來
gcore
gcore是原生程序的內(nèi)存轉(zhuǎn)儲工具,類似jmap,可以將程序的整個內(nèi)存空間轉(zhuǎn)儲為文件,如下:
# 其中10235是進(jìn)程的pid
$ gcore -o coredump 10235
$ ll coredump*
-rw-r--r-- 1 work work 3.7G 2021-11-07 23:05:46 coredump.10235
轉(zhuǎn)儲出來的dump文件,可以直接使用gdb進(jìn)行調(diào)試,如下:
$ gdb java coredump.10235
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from java...
(No debugging symbols found in java)
[New LWP 10235]
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `java'.
#0 __pthread_clockjoin_ex (threadid=139794527459072, thread_return=0x7ffd83e54538, clockid=, abstime=, block=) at pthread_join_common.c:145
145 pthread_join_common.c: No such file or directory.
[Current thread is 1 (Thread 0x7f2474987040 (LWP 10235))]
(gdb)
另外,可以如下配置內(nèi)核參數(shù),使得程序發(fā)生內(nèi)存訪問越界/段錯誤時,自動生成coredump文件。
# 開啟coredump
$ ulimit -c unlimited
# 持久化方式
$ echo "ulimit -c unlimited" >> /etc/profile
# 設(shè)置coredump的命名規(guī)則
$ vi /etc/sysctl.conf
kernel.core_pattern=%e.core.%s_%t
# core_pattern占位變量解釋
%p pid
%u uid
%g gid
%s signal number
%t UNIX time of dump
%h hostname
%e executable filename
# 查看coredump命名規(guī)則配置
$ cat /proc/sys/kernel/core_pattern
有時,在java進(jìn)程上執(zhí)行jmap時,會無法執(zhí)行成功,這時可以使用gcore替代生成coredump,然后使用jmap轉(zhuǎn)換為mat可以分析的hprof文件。
$ jmap -dump:format=b,file=heap.hprof `which java` coredump.10235
線程棧分析方法
一般來說,jstack配合top只能定位類似死循環(huán)這種非常嚴(yán)重的性能問題,由于cpu速度太快了,對于性能稍差但又不非常差的代碼,單次jstack很難從線程棧中捕捉到問題代碼。
因為性能稍差的代碼可能只會比好代碼多耗1ms的cpu時間,但這1ms就比我們的手速快多了,當(dāng)執(zhí)行jstack時線程早已執(zhí)行到非問題代碼了。
既然手動執(zhí)行jstack不行,那就讓腳本來,快速執(zhí)行jstack多次,雖然問題代碼對于人來說執(zhí)行太快,但對于正常代碼來說,它還是慢一些的,因此當(dāng)我快速捕捉多次線程棧時,問題代碼出現(xiàn)的次數(shù)肯定比正常代碼出現(xiàn)的次數(shù)要多。
# 每0.2s執(zhí)行一次jstack,并將線程棧數(shù)據(jù)保存到j(luò)stacklog.log
$ while sleep 0.2;do \
pid=$(pgrep -n java); \
[[ $pid ]] && jstack $pid; \
done > jstacklog.log
$ wc -l jstacklog.log
291121 jstacklog.log
抓了這么多線程棧,如何分析呢?我們可以使用linux中的文本命令來處理,按線程棧分組計數(shù)即可,如下:
$ cat jstacklog.log \
|sed -E -e 's/0x[0-9a-z]+/0x00/g' -e '/^"/ s/[0-9]+/n/g' \
|awk -v RS="" 'gsub(/\n/,"\\n",$0)||1' \
|sort|uniq -c|sort -nr \
|sed 's/$/\n/;s/\\n/\n/g' \
|less
出現(xiàn)次數(shù)最多的線程棧,大概率就是性能不佳或阻塞住的代碼,上圖中com.example.demo.web.controller.TestController.select方法棧抓取到2053次,是因為我一直在壓測這一個接口,所以它被抓出來最多。
火焰圖
可以發(fā)現(xiàn),用文本命令分析線程棧并不直觀,好在性能優(yōu)化大師Brendan Gregg發(fā)明了火焰圖,并開發(fā)了一套火焰圖生成工具。
工具下載地址:https://github.com/brendangregg/FlameGraph
將jstack抓取的一批線程棧,生成一個火焰圖,如下:
$ cat jstacklog.log \
| ./FlameGraph/stackcollapse-jstack.pl --no-include-tname \
| ./FlameGraph/flamegraph.pl --cp > jstacklog.svg
如上,生成的火焰圖是svg文件,使用瀏覽器打開即可,在火焰圖中,顏色并沒有實際含義,它將相同的線程棧聚合在一起顯示,因此,圖中越寬的棧表示此棧在運行過程中,被抓取到的次數(shù)越多,也是我們需要重點優(yōu)化的地方。
perf
perf是Linux官方維護(hù)的性能分析工具,它可以觀測很多非常細(xì)非常硬核的指標(biāo),如IPC,cpu緩存命中率等,如下:
# ubuntu安裝perf,包名和內(nèi)核版本相關(guān),可以直接輸入perf命令會給出安裝提示
sudo apt install linux-tools-5.4.0-74-generic linux-cloud-tools-5.4.0-74-generic
# cpu的上下文切換、cpu遷移、IPC、分支預(yù)測
sudo perf stat -a sleep 5
# cpu的IPC與緩存命中率
sudo perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles -a sleep 10
# cpu的1級數(shù)據(jù)緩存命中率
sudo perf stat -e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores -a sleep 10
# 頁表緩存TLB命中率
sudo perf stat -e dTLB-loads,dTLB-load-misses,dTLB-prefetch-misses -a sleep 10
# cpu的最后一級緩存命中率
sudo perf stat -e LLC-loads,LLC-load-misses,LLC-stores,LLC-prefetches -a sleep 10
# Count system calls by type for the specified PID, until Ctrl-C:
sudo perf stat -e 'syscalls:sys_enter_*' -p PID -I1000 2>&1 | awk '$2 != 0'
# Count system calls by type for the entire system, for 5 seconds:
sudo perf stat -e 'syscalls:sys_enter_*' -a sleep 5 2>&1| awk '$1 != 0'
# Count syscalls per-second system-wide:
sudo perf stat -e raw_syscalls:sys_enter -I 1000 -a
當(dāng)然,它也可以用來抓取線程棧,并生成火焰圖,如下:
# 抓取60s的線程棧,其中1892是mysql的進(jìn)程pid
$ sudo perf record -F 99 -p 1892 -g sleep 60
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.509 MB perf.data (6938 samples) ]
# 生成火焰圖
$ sudo perf script \
| ./FlameGraph/stackcollapse-perf.pl \
| ./FlameGraph/flamegraph.pl > mysqld_flamegraph.svg
如上所示,使用perf生成的mysql的火焰圖,perf抓取線程棧相比jstack的優(yōu)點是,抓取精度比jstack更高,運行開銷比jstack更小,并且還可以抓到Linux內(nèi)核調(diào)用棧!
當(dāng)然,如此強大的它,也需要像root這樣至高無上的權(quán)限才能使用。
注:perf抓取線程棧,是順著cpu上的棧寄存器找到當(dāng)前線程的調(diào)用棧的,因此它只能抓到當(dāng)前正在cpu上運行線程的線程棧,因此通過perf可以非常容易做oncpu分析,分析high cpu問題。
注意:某些情況下,perf獲取到的mysql線程棧是破碎的,類似_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command這種,非常難讀,這種情況下,可以先使用sudo perf script | c++filt命令處理下c++函數(shù)命名問題,再將輸出給到stackcollapse-perf.pl腳本,c++filt使用效果如下:
# 破碎的函數(shù)名
$ objdump -tT `which mysqld` | grep dispatch_command
00000000014efdf3 g F .text 000000000000252e _Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command
00000000014efdf3 g DF .text 000000000000252e Base _Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command
# 使用c++filt處理過后的函數(shù)名
$ objdump -tT `which mysqld` | grep dispatch_command | c++filt
00000000014efdf3 g F .text 000000000000252e dispatch_command(THD*, COM_DATA const*, enum_server_command)
00000000014efdf3 g DF .text 000000000000252e Base dispatch_command(THD*, COM_DATA const*, enum_server_command)
offcpu火焰圖
線程在運行的過程中,要么在CPU上執(zhí)行,要么被鎖或io操作阻塞,從而離開CPU進(jìn)去睡眠狀態(tài),待被解鎖或io操作完成,線程會被喚醒而變成運行態(tài)。
如下:
當(dāng)線程在cpu上運行時,我們稱其為oncpu,當(dāng)線程阻塞而離開cpu后,稱其為offcpu。
而線程離開CPU和之后被喚醒,就是我們常說的線程上下文切換,如果線程是因為io或鎖等待而主動讓出cpu,稱為自愿上下文切換,如果是時間片用盡而讓出cpu,稱為非自愿上下文切換。
如果當(dāng)線程在睡眠之前記錄一下當(dāng)前時間,然后被喚醒時記錄當(dāng)前線程棧與阻塞時間,再使用FlameGraph工具將線程棧繪制成一張火焰圖,這樣我們就得到了一張offcpu火焰圖,火焰圖寬的部分就是線程阻塞較多的點了,這就需要再介紹一下bcc工具了。
bcc
bcc是使用Linux ebpf機制實現(xiàn)的一套工具集,包含非常多的底層分析工具,如查看文件緩存命中率,tcp重傳情況,mysql慢查詢等等,如下:
另外,它還可以做offcpu分析,生成offcpu火焰圖,如下:
# ubuntu安裝bcc工具集
$ sudo apt install bpfcc-tools
# 使用root身份進(jìn)入bash
$ sudo bash
# 獲取jvm函數(shù)符號信息
$ ./FlameGraph/jmaps
# 抓取offcpu線程棧30s,83101是mysqld的進(jìn)程pid
$ offcputime-bpfcc -df -p 83101 30 > offcpu_stack.out
# 生成offcpu火焰圖
$ cat offcpu_stack.out \
| sed 's/;::/:::/g' \
| ./FlameGraph/flamegraph.pl --color=io --title="Off-CPU Time Flame Graph" --countname=us > offcpu_stack.svg
如上圖,我繪制了一張mysql的offcpu火焰圖,可以發(fā)現(xiàn)大多數(shù)線程的offcpu都是由鎖引起的,另外,offcpu火焰圖與oncpu火焰圖稍有不同,oncpu火焰圖寬度代表線程棧出現(xiàn)次數(shù),而offcpu火焰圖寬度代表線程棧阻塞時間。
在我分析此圖的過程中,發(fā)現(xiàn)JOIN::make_join_plan()函數(shù)最終竟然去調(diào)用了btr_estimate_n_rows_in_range_low這個函數(shù),顯然JOIN::make_join_plan()應(yīng)該是用于制定SQL執(zhí)行計劃的,為啥好像去讀磁盤數(shù)據(jù)了?經(jīng)過在網(wǎng)上嘗試搜索btr_estimate_n_rows_in_range_low,發(fā)現(xiàn)mysql有個機制,當(dāng)使用非唯一索引查詢數(shù)據(jù)時,mysql為了執(zhí)行計劃的準(zhǔn)確性,會實際去訪問一些索引數(shù)據(jù),來輔助評估當(dāng)前索引是否應(yīng)該使用,避免索引統(tǒng)計信息不準(zhǔn)導(dǎo)致的SQL執(zhí)行計劃不優(yōu)。
我看了看相關(guān)表與SQL,發(fā)現(xiàn)當(dāng)前使用的索引,由于之前研究分區(qū)表而弄成了非唯一索引了,實際上它可以設(shè)置為唯一索引的,于是我把索引調(diào)整成唯一索引,再次壓測SQL,發(fā)現(xiàn)QPS從2300增加到了2700左右。
這里,也可以看出,對于一些沒讀過源碼的中間件(如mysql),也是可以嘗試使用perf或bcc生成火焰圖來分析的,不知道各函數(shù)是干啥用的,可以直接把函數(shù)名拿到baidu、google去搜索,瞄一瞄別人源碼解析的文章了解其作用。
bcc項目地址:https://github.com/iovisor/bcc
文章題目:Linux系統(tǒng)常用的線程與內(nèi)存剖析工具
地址分享:http://www.dlmjj.cn/article/cdjjech.html


咨詢
建站咨詢
