ValgrindとGDBでプログラムのメモリアクセスをみてみる

プログラムの動的な振る舞い、特にメモリアクセスの様子をみてみたくなることがあります。

たとえば、2017年のデザインガイアで発表されていたFPGAアクセラレータ開発を支援するためのツール環境では、ValgrindとGDBを使ってメモリアクセスの様子を可視化していて、面白そうだな、やってみたいなと思わされます。

とりあえず Valgrind + GDBでメモリアクセスを確認する方法を、ちょっと試してみました。

Valgrindを使ってみる

まずはValgrindを使ってみます。ターゲットはfree忘れの簡単なプログラムです。

#include <stdlib.h>
#include <strings.h>

#define N (100)
#define M (128)

void dut()
{
  char *ptr;
  for(int i = 0; i < N; i++) {
    ptr = (char*)malloc(sizeof(char)*M);
    bzero(ptr, sizeof(char)*M);
  }
  return;
}

int main(int argc, char **argv)
{
  dut();
}

gccやclangでコンパイルします。

$ gcc -g -Wall -o hoge hoge.c   # clang -g -Wall -o hoge hoge.c でもよい

valgrindでチェックしてみます。ValgrindはUbuntu 20.04に apt でインストールしたものを使いました。

$ valgrind --tool=memcheck --leak-check=full --log-file=hoge.log ./hoge

hoge.logに結果が書き出されます。

==17663== Memcheck, a memory error detector
==17663== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17663== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==17663== Command: ./hoge
==17663== Parent PID: 6710
==17663==
==17663==
==17663== HEAP SUMMARY:
==17663==     in use at exit: 12,800 bytes in 100 blocks
==17663==   total heap usage: 100 allocs, 0 frees, 12,800 bytes allocated
==17663==
==17663== 12,800 bytes in 100 blocks are definitely lost in loss record 1 of 1
==17663==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==17663==    by 0x109167: dut (hoge.c:11)
==17663==    by 0x1091D3: main (hoge.c:19)
==17663==
==17663== LEAK SUMMARY:
==17663==    definitely lost: 12,800 bytes in 100 blocks
==17663==    indirectly lost: 0 bytes in 0 blocks
==17663==      possibly lost: 0 bytes in 0 blocks
==17663==    still reachable: 0 bytes in 0 blocks
==17663==         suppressed: 0 bytes in 0 blocks
==17663==
==17663== For lists of detected and suppressed errors, rerun with: -s
==17663== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

128Bytesのメモリを100回確保してfreeしてないので、12,800Bytesのメモリリークが発生しています。

GDBと組み合わせて使う

ValgrindのGDBサーバーとGDBを連携させて操作します。まずは、GDBサーバーを有効にしてValgrindを起動します。--vgdbの引数がfullまたはyesでGDBサーバーが有効になります。fullの方が精度が高く、yesだとあとで試すメモリアクセスのカウントでは取りこぼしが発生します。

$ valgrind --leak-check=full --vgdb=full --vgdb-error=0 ./hoge

別のターミナルでgdbを起動して、Valgrindに接続します。

$ gdb
...(略)...
(gdb) file ./hoge
Reading symbols from ./hoge...
(gdb) target remote | vgdb
Remote debugging using | vgdb
...
(gdb)

ブレークポイントをセットして、continueと monitor leak_check を繰り返すとメモリリークしてる様子が観察できます。

(gdb) break 11
Breakpoint 1 at 0x10915e: file hoge.c, line 11.
(gdb) c
Continuing.

Breakpoint 1, dut () at hoge.c:11
11	    ptr = (char*)malloc(sizeof(char)*M);
(gdb) monitor leak_check
==18230== All heap blocks were freed -- no leaks are possible
==18230== 
(gdb) c
Continuing.

Breakpoint 1, dut () at hoge.c:11
11	    ptr = (char*)malloc(sizeof(char)*M);
(gdb) monitor leak_check
==18230== LEAK SUMMARY:
==18230==    definitely lost: 0 (+0) bytes in 0 (+0) blocks
==18230==    indirectly lost: 0 (+0) bytes in 0 (+0) blocks
==18230==      possibly lost: 0 (+0) bytes in 0 (+0) blocks
==18230==    still reachable: 128 (+128) bytes in 1 (+1) blocks
==18230==         suppressed: 0 (+0) bytes in 0 (+0) blocks
==18230== Reachable blocks (those to which a pointer was found) are not shown.
==18230== To see them, add 'reachable any' args to leak_check
==18230== 
(gdb) c
Continuing.

Breakpoint 1, dut () at hoge.c:11
11	    ptr = (char*)malloc(sizeof(char)*M);
(gdb) monitor leak_check
==18230== 128 (+128) bytes in 1 (+1) blocks are definitely lost in loss record 2 of 2
==18230==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==18230==    by 0x109167: dut (hoge.c:11)
==18230==    by 0x1091D3: main (hoge.c:19)
==18230== 
==18230== LEAK SUMMARY:
==18230==    definitely lost: 128 (+128) bytes in 1 (+1) blocks
==18230==    indirectly lost: 0 (+0) bytes in 0 (+0) blocks
==18230==      possibly lost: 0 (+0) bytes in 0 (+0) blocks
==18230==    still reachable: 128 (+0) bytes in 1 (+0) blocks
==18230==         suppressed: 0 (+0) bytes in 0 (+0) blocks
==18230== Reachable blocks (those to which a pointer was found) are not shown.
==18230== To see them, add 'reachable any' args to leak_check
==18230== 
(gdb) c
Continuing.

Breakpoint 1, dut () at hoge.c:11
11	    ptr = (char*)malloc(sizeof(char)*M);
(gdb) monitor leak_check
==18230== 256 (+128) bytes in 2 (+1) blocks are definitely lost in loss record 2 of 2
==18230==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==18230==    by 0x109167: dut (hoge.c:11)
==18230==    by 0x1091D3: main (hoge.c:19)
==18230== 
==18230== LEAK SUMMARY:
==18230==    definitely lost: 256 (+128) bytes in 2 (+1) blocks
==18230==    indirectly lost: 0 (+0) bytes in 0 (+0) blocks
==18230==      possibly lost: 0 (+0) bytes in 0 (+0) blocks
==18230==    still reachable: 128 (+0) bytes in 1 (+0) blocks
==18230==         suppressed: 0 (+0) bytes in 0 (+0) blocks
==18230== Reachable blocks (those to which a pointer was found) are not shown.
==18230== To see them, add 'reachable any' args to leak_check
==18230== 
(gdb) delete
(gdb) continue
(gdb) quit

PythonでGDBを操作する

PythonスクリプトでGDBを操作できます。上でコマンド入力した操作に相当するスクリプトは次のようにかけます。

import gdb

gdb.execute("file ./hoge")
gdb.execute("set pagination off")
gdb.execute("target remote | vgdb")
gdb.Breakpoint("hoge.c:11")
while True:
    try:
        gdb.execute("monitor leak_check")
        gdb.execute("continue")
    except gdb.error:
        print(sys.exc_info())
        break
gdb.execute("quit")

スクリプトをファイルに hoge.py に保存してGDBで実行します。もちろん、事前にValgrindを起動するのを忘れないようにします。

$ gdb -x hoge.py

GDBでプログラムを止めたり再開したりしながらメモリチェックできることがわかります。

メモリアクセスを確認する

GDBのawatch/rwatchを使うと読み書き/読み出しアクセスをチェックすることができます。たとえば、次のようなプログラムを考えます。

volatile int global_a = 0;
volatile int global_b[100];
volatile int global_c[100];
volatile int global_d[100];
volatile int global_e[100];

void dut(int n)
{
  for(int i = 0; i < n; i++){
    global_a = global_a + 1;
    global_b[i] = i;
    global_c[i] = i;
    global_d[i] = i;
    global_e[i] = i;
  }
}

int main(int argc, char **argv)
{
  dut(10);
}

たとえば、下記のようにglobal_aへのアクセスをフックすることができます。

$ gdb fefe
...
(gdb) awatch global_a
Hardware access (read/write) watchpoint 1: global_a
(gdb) rwatch global_a
Hardware read watchpoint 2: global_a
(gdb) run
Starting program: /home/miyo/tmp/fefe 

Hardware access (read/write) watchpoint 1: global_a

Value = 0

Hardware read watchpoint 2: global_a

Value = 0
0x0000555555555143 in dut (n=10) at fefe.c:11
11	    global_a = global_a+1;
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 1: global_a

Old value = 0
New value = 1
dut (n=10) at fefe.c:12
12	    global_b[i] = i;
(gdb)

最後まで実行すると、

(gdb) info watchpoints
Num     Type            Disp Enb Address    What
1       acc watchpoint  keep y              global_a
	breakpoint already hit 20 times
2       read watchpoint keep y              global_a
	breakpoint already hit 10 times
(gdb) 
Num     Type            Disp Enb Address    What
1       acc watchpoint  keep y              global_a
	breakpoint already hit 20 times
2       read watchpoint keep y              global_a
	breakpoint already hit 10 times
(gdb)

という感じで、読み書きが20回と読み出しが10回（つまり、書き込みは10回）発生したことがわかります。

と、便利なのですが、実環境ではハードウェアウォッチポイントの数(x86-64マシンでは4個)までしかセットできないのと、8Bytesまでのサイズの型にしかセットできません。試してると、

$ gdb ./fefe
...
(gdb) awatch global_a
Hardware access (read/write) watchpoint 1: global_a
(gdb) awatch global_b
Hardware access (read/write) watchpoint 2: global_b
(gdb) awatch global_c
Hardware access (read/write) watchpoint 3: global_c
(gdb) awatch global_d
Hardware access (read/write) watchpoint 4: global_d
(gdb) awatch global_e
Hardware access (read/write) watchpoint 5: global_e
(gdb) run
Starting program: /home/miyo/tmp/fefe 
Error in re-setting breakpoint 2: Expression cannot be implemented with read/access watchpoint.
Error in re-setting breakpoint 3: Expression cannot be implemented with read/access watchpoint.
Error in re-setting breakpoint 4: Expression cannot be implemented with read/access watchpoint.
Error in re-setting breakpoint 5: Expression cannot be implemented with read/access watchpoint.
Warning:
Could not insert hardware watchpoint 2.
Could not insert hardware watchpoint 3.
Could not insert hardware watchpoint 4.
Could not insert hardware watchpoint 5.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.

(gdb)

とエラーが発生します。

というわけで、Valgrindを挟んで実行します。Valgrindサーバーを起動して、

$ valgrind --leak-check=full --vgdb=full --vgdb-error=0 ./fefe

別のターミナルでGDBを起動してリモート接続してawatchをセットしてみます。

$ gdb
(gdb) file fefe
Reading symbols from fefe...
(gdb) target remote | vgdb
...
(gdb) awatch global_a
Hardware access (read/write) watchpoint 1: global_a
(gdb) awatch global_b
Hardware access (read/write) watchpoint 2: global_b
(gdb) awatch global_c
Hardware access (read/write) watchpoint 3: global_c
(gdb) awatch global_d
Hardware access (read/write) watchpoint 4: global_d
(gdb) awatch global_e
Hardware access (read/write) watchpoint 5: global_e
(gdb) s
Cannot find bounds of current function
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 1: global_a

Value = 0
dut (n=10) at fefe.c:11
11	    global_a = global_a+1;
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 1: global_a

Old value = 0
New value = 1
dut (n=10) at fefe.c:12
12	    global_b[i] = i;
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 2: global_b

Value = {0 <repeats 100 times>}
dut (n=10) at fefe.c:13
13	    global_c[i] = i;
(gdb)

と、メモリアクセスを検出することができました。

アドレスでメモリアクセスを確認する

アドレスでもawatchやrwatchをセットできます。実行時にアドレスが決まるような領域はアドレスを使ってawatchやrwatchを仕込むことができます。グローバル変数の場合アドレスはコンパイル時に決まりますので、そのアドレスを使ってメモリアクセスを確認する。

Ubuntu 20.04の場合PIEでバイナリが生成されるので、バイナリから読み出したアドレスでは読み出すことができません。そのため

$ gcc -no-pie -g -o fefe fefe.c

とPIEを無効にしてバイナリを作る必要があります。-no-pieを指定してコンパイルしたバイナリをreadelfで開いてみるとグローバル変数のアドレスを確認できます。

% readelf -s fefe | grep global_                                                                                                                                         ~/tmp 
    33: 00000000004010d0     0 FUNC    LOCAL  DEFAULT   12 __do_global_dtors_aux
    35: 0000000000403e58     0 OBJECT  LOCAL  DEFAULT   18 __do_global_dtors_aux_fin
    48: 0000000000404060   400 OBJECT  GLOBAL DEFAULT   23 global_b
    49: 0000000000404200   400 OBJECT  GLOBAL DEFAULT   23 global_d
    51: 0000000000404044     4 OBJECT  GLOBAL DEFAULT   23 global_a
    59: 00000000004043a0   400 OBJECT  GLOBAL DEFAULT   23 global_c
    60: 0000000000404540   400 OBJECT  GLOBAL DEFAULT   23 global_e

0x404044と0x404060を使ってglobal_aとglobal_bへのアクセスを検出してみます。

(gdb) file fefe
Reading symbols from fefe...
(gdb) target remote | vgdb
...
(gdb) awatch *(int*)0x404044
Hardware access (read/write) watchpoint 1: *(int*)0x404044
(gdb) awatch *(int (*)[100])0x404060
Hardware access (read/write) watchpoint 2: *(int (*)[100])0x404060
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 1: *(int*)0x404044

Value = 0
dut (n=10) at fefe.c:11
11	    global_a = global_a+1;
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 1: *(int*)0x404044

Old value = 0
New value = 1
dut (n=10) at fefe.c:12
12	    global_b[i] = i;
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 2: *(int (*)[100])0x404060

Value = {0 <repeats 100 times>}
dut (n=10) at fefe.c:13
13	    global_c[i] = i;
(gdb) c
Continuing.

Hardware access (read/write) watchpoint 1: *(int*)0x404044

Value = 1
dut (n=10) at fefe.c:11
11	    global_a = global_a+1;
(gdb)

と、無事にメモリアクセスが検出できました。

まとめ

というわけで、冒頭に紹介した論文のようにValgrindとGDBを使ってメモリアクセス解析をする場合の、ValgrindとGDBの使い方は理解できました。