2011-03-13 74 views
10

我的程序是用dietlibc静态编译的。它在ubuntu x64上编译(使用-m32标志为x86编译)并在centos x86上运行。gdb奇怪的回溯

编译的大小只有大约100KB。我用-ggdb3编译它,没有优化标志。

我的程序使用signal.h来处理一个SIGSEGV信号,然后调用abort()。

该程序运行没有问题的天,但有时段错误。这是当我得到奇怪的回溯,我不明白:

 
[email protected]:~/Desktop$ gdb -c core.28569 program-name 
GNU gdb (GDB) 7.2 
Copyright (C) 2010 Free Software Foundation, Inc. 
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law. Type "show copying" 
and "show warranty" for details. 
This GDB was configured as "--host=x86_64-linux-gnu --target=i386-linux-gnu". 
For bug reporting instructions, please see: 
... 
Reading symbols from program-name...done. 
[New Thread 28569] 
Core was generated by `program-name'. 
Program terminated with signal 6, Aborted. 
#0 0x00914410 in __kernel_vsyscall() 
Setting up the environment for debugging gdb. 
Function "internal_error" not defined. 
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal] 
Function "info_command" not defined. 
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal] 
.gdbinit:8: Error in sourced command file: 
Argument required (one or more breakpoint numbers). 
(gdb) bt 
#0 0x00914410 in __kernel_vsyscall() 
During symbol reading, incomplete CFI data; unspecified registers (e.g., eax) at 0x914411. 
#1 0x0804d7f4 in __unified_syscall() 
#2 0xbf8966c0 in ??() 
#3 
#4 0x2054454e in ??() 
#5 0x20524c43 in ??() 
#6 0x2e352e33 in ??() 
#7 0x32373033 in ??() 
#8 0x2e203b39 in ??() 
#9 0x2054454e in ??() 
#10 0x20524c43 in ??() 
#11 0x2e302e33 in ??() 
#12 0x32373033 in ??() 
#13 0x4d203b39 in ??() 
#14 0x61696465 in ??() 
#15 0x6e654320 in ??() 
#16 0x20726574 in ??() 
#17 0x36204350 in ??() 
#18 0x203b302e in ??() 
#19 0x54454e2e in ??() 
#20 0x43302e34 in ??() 
#21 0x00000029 in ??() 
#22 0xbf8989a8 in ??() 
Backtrace stopped: previous frame inner to this frame (corrupt stack?) 
(gdb) bt full 
#0 0x00914410 in __kernel_vsyscall() 
No symbol table info available. 
#1 0x0804d7f4 in __unified_syscall() 
No symbol table info available. 
#2 0xbf8966c0 in ??() 
No symbol table info available. 
#3 
No symbol table info available. 
#4 0x2054454e in ??() 
No symbol table info available. 
#5 0x20524c43 in ??() 
No symbol table info available. 
#6 0x2e352e33 in ??() 
No symbol table info available. 
#7 0x32373033 in ??() 
No symbol table info available. 
#8 0x2e203b39 in ??() 
No symbol table info available. 
#9 0x2054454e in ??() 
No symbol table info available. 
#10 0x20524c43 in ??() 
No symbol table info available. 
#11 0x2e302e33 in ??() 
No symbol table info available. 
#12 0x32373033 in ??() 
No symbol table info available. 
#13 0x4d203b39 in ??() 
No symbol table info available. 
#14 0x61696465 in ??() 
No symbol table info available. 
#15 0x6e654320 in ??() 
No symbol table info available. 
#16 0x20726574 in ??() 
No symbol table info available. 
#17 0x36204350 in ??() 
No symbol table info available. 
#18 0x203b302e in ??() 
No symbol table info available. 
#19 0x54454e2e in ??() 
No symbol table info available. 
#20 0x43302e34 in ??() 
No symbol table info available. 
#21 0x00000029 in ??() 
No symbol table info available. 
#22 0xbf8989a8 in ??() 
No symbol table info available. 
Backtrace stopped: previous frame inner to this frame (corrupt stack?) 
(gdb) quit 

回答

16

这是一个堆栈溢出。

#4 0x2054454e in ??() 

这看起来像文本, “十” 或 “NET”

#5 0x20524c43 in ??() 

“RLC” 或 “CLR”

等。

将地址看作是文本 - 看看您是否可以确定此文本覆盖堆栈的位置。

+0

其实Erik是对的。这是一个未初始化变量的strncat。这就是为什么它有时会被隔断,而其他时候则不会。顺便说一句,文本确实是“NET”和“CLR”。谢谢。 – 2011-03-13 18:29:40

+0

即使在这种情况下,我的答案可能是*也是*正确的,并且您最好在接下来的*时间内调用dietlibc来呼叫中止。 – 2011-03-13 23:13:14

6

你的堆栈跟踪其实很容易理解:

  • 你有SIGSEGV的地方,
  • 信号处理程序做任何它,然后叫abort()
  • 颁发raise(2)系统调用,通过调用__unified_syscall()

在GDB中没有堆栈跟踪的原因是:

  • __unified_syscall在组件实现,并且
  • 不使用帧指针,和
  • 没有正确cfi指令来描述如何从它放松。

我会认为这是一个在dietlibc中的错误,很容易修复,实际上。看看这个(未经测试)补丁修复了它你:

--- dietlibc-0.31/i386/unified.S.orig 2011-03-13 10:16:23.000000000 -0700 
+++ dietlibc-0.31/i386/unified.S 2011-03-13 10:21:32.000000000 -0700 
@@ -31,8 +31,14 @@ __unified_syscall: 
    movzbl %al, %eax 
.L1: 
    push %edi 
+  cfi_adjust_cfa_offset (4) 
+  cfi_rel_offset (edi, 0) 
    push %esi 
+  cfi_adjust_cfa_offset (4) 
+  cfi_rel_offset (esi, 0) 
    push %ebx 
+  cfi_adjust_cfa_offset (4) 
+  cfi_rel_offset (ebx, 0) 
    movl %esp,%edi 
    /* we use movl instead of pop because otherwise a signal would 
     destroy the stack frame and crash the program, although it 
@@ -61,8 +67,11 @@ __unified_syscall: 
#endif 
.Lnoerror: 
    pop %ebx 
+  cfi_adjust_cfa_offset (-4) 
    pop %esi 
+  cfi_adjust_cfa_offset (-4) 
    pop %edi 
+  cfi_adjust_cfa_offset (-4) 

/* here we go and "reuse" the return for weak-void functions */ 
#include "dietuglyweaks.h" 

如果不能重建dietlibc,或者该修补程序不正确,你仍然可以更好地分析堆栈跟踪。据我所知,__unified_syscall不碰%ebp。所以,你可能能够通过这样做是为了得到一个合理的堆栈跟踪:

define xbt 
    set $xbp = (void **)$arg0 
    while 1 
    x/2a $xbp 
    set $xbp = (void **)$xbp[0] 
    end 
end 

xbt $ebp 

注:如果xbt作品,它很可能进入周围的SIGSEGV信号帧的杂草(即框架不使用帧指针)。这可能会导致完整的垃圾,或者在一个或两个跳过帧(这将发生SIGSEGV)。

所以你真的好得多得到正确的解开描述符到dietlibc。