2013-05-31 40 views
0

我正在用Sage做一些计算。 我在玩fork。我有一个非常简单的测试情况下,这基本上是这样的:(看下面的_fork_test_func()一些矩阵计算)SIGILL在Sage/Python后分叉

def fork_test(): 
    import os 
    pid = os.fork() 
    if pid != 0: 
     print "parent, child: %i" % pid 
     os.waitpid(pid, 0) 
    else: 
     print "child" 
     try: 
      # some dummy matrix calculation 
     finally: 
      os._exit(0) 

而且我越来越:

------------------------------------------------------------------------ 
Unhandled SIGILL: An illegal instruction occurred in Sage. 
This probably occurred because a *compiled* component of Sage has a bug 
in it and is not properly wrapped with sig_on(), sig_off(). You might 
want to run Sage under gdb with 'sage -gdb' to debug this. 
Sage will now terminate. 
------------------------------------------------------------------------ 

有了这个(不完全)回溯:

Crashed Thread: 0 Dispatch queue: com.apple.root.default-priority 

Exception Type: EXC_BAD_INSTRUCTION (SIGILL) 
Exception Codes: 0x0000000000000001, 0x0000000000000000 

Application Specific Information: 
BUG IN LIBDISPATCH: flawed group/semaphore logic 

Thread 0 Crashed:: Dispatch queue: com.apple.root.default-priority 
0 libsystem_kernel.dylib   0x00007fff8c6d1d46 __kill + 10 
1 libcsage.dylib     0x0000000101717f33 sigdie + 124 
2 libcsage.dylib     0x0000000101717719 sage_signal_handler + 364 
3 libsystem_c.dylib    0x00007fff86b1094a _sigtramp + 26 
4 libdispatch.dylib    0x00007fff89a66c74 _dispatch_thread_semaphore_signal + 27 
5 libdispatch.dylib    0x00007fff89a66f3e _dispatch_apply2 + 143 
6 libdispatch.dylib    0x00007fff89a66e30 dispatch_apply_f + 440 
7 libBLAS.dylib     0x00007fff906ca435 APL_dtrsm + 1963 
8 libBLAS.dylib     0x00007fff906702b6 cblas_dtrsm + 882 
9 matrix_modn_dense_double.so  0x0000000108612615 void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 2853 
10 matrix_modn_dense_double.so  0x0000000108611daa void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::delayed<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, unsigned long, unsigned long) + 698 
11 matrix_modn_dense_double.so  0x0000000108612ccf void FFLAS::Protected::ftrsmRightLowerNoTransUnit<double>::operator()<FFPACK::Modular<double> >(FFPACK::Modular<double> const&, unsigned long, unsigned long, FFPACK::Modular<double>::Element*, unsigned long, FFPACK::Modular<double>::Element*, unsigned long) + 831 
12 ???        0x00007f99e481a028 0 + 140298940424232 

Thread 1: 
0 libsystem_kernel.dylib   0x00007fff8c6d26d6 __workq_kernreturn + 10 
1 libsystem_c.dylib    0x00007fff86b24f4c _pthread_workq_return + 25 
2 libsystem_c.dylib    0x00007fff86b24d13 _pthread_wqthread + 412 
3 libsystem_c.dylib    0x00007fff86b0f1d1 start_wqthread + 13 

Thread 2: 
0 libsystem_kernel.dylib   0x00007fff8c6d26d6 __workq_kernreturn + 10 
1 libsystem_c.dylib    0x00007fff86b24f4c _pthread_workq_return + 25 
2 libsystem_c.dylib    0x00007fff86b24d13 _pthread_wqthread + 412 
3 libsystem_c.dylib    0x00007fff86b0f1d1 start_wqthread + 13 

Thread 0 crashed with X86 Thread State (64-bit): 
    rax: 0x0000000000000000 rbx: 0x00007fff5ec8e418 rcx: 0x00007fff5ec8df28 rdx: 0x0000000000000000 
    rdi: 0x000000000000b8f7 rsi: 0x0000000000000004 rbp: 0x00007fff5ec8df40 rsp: 0x00007fff5ec8df28 
    r8: 0x00007fff5ec8e418 r9: 0x0000000000000000 r10: 0x000000000000000a r11: 0x0000000000000202 
    r12: 0x00007f99ea500de0 r13: 0x0000000000000003 r14: 0x00007fff5ec8e860 r15: 0x00007fff906ca447 
    rip: 0x00007fff8c6d1d46 rfl: 0x0000000000000202 cr2: 0x00007fff74a29848 
Logical CPU: 0 

有什么特别的我需要做a fork?我抬头看看圣人的装饰者fork,它看起来基本上是一样的。

坠机事件也发生在Sage本身的fork装饰者身上。另一个测试案例:

def fork_test2(): 
    def test(): 
     # do some stuff 
    from sage.parallel.decorate import fork 
    test_ = fork(test, verbose=True) 
    test_() 

即使简单的测试用例:

def _fork_test_func(): 
    while True: 
     m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)]) 
     m.right_kernel() 

def fork_test(): 
    import os 
    pid = os.fork() 
    if pid != 0: 
     print "parent, child: %i" % pid 
     os.waitpid(pid, 0) 
    else: 
     print "child" 
     try: 
      _fork_test_func() 
     finally: 
      os._exit(0) 

结果略有不同的崩溃:

python(48672) malloc: *** error for object 0x11185f000: pointer being freed already on death-row 
*** set a breakpoint in malloc_error_break to debug 

随着回溯:

Crashed Thread: 1 Dispatch queue: com.apple.root.default-priority 

Exception Type: EXC_CRASH (SIGABRT) 
Exception Codes: 0x0000000000000000, 0x0000000000000000 

Application Specific Information: 
*** error for object 0x11185f000: pointer being freed already on death-row 


Thread 0:: Dispatch queue: com.apple.main-thread 
0 matrix2.so      0x0000000107fa403f __pyx_pw_4sage_6matrix_7matrix2_6Matrix_71right_kernel_matrix + 27551 
1 ???        0x000000000000000d 0 + 13 

Thread 1 Crashed:: Dispatch queue: com.apple.root.default-priority 
0 libsystem_kernel.dylib   0x00007fff8c6d239a __semwait_signal_nocancel + 10 
1 libsystem_c.dylib    0x00007fff86b17e1b nanosleep$NOCANCEL + 138 
2 libsystem_c.dylib    0x00007fff86b7b9a8 usleep$NOCANCEL + 54 
3 libsystem_c.dylib    0x00007fff86b67eca __abort + 203 
4 libsystem_c.dylib    0x00007fff86b67dff abort + 192 
5 libsystem_c.dylib    0x00007fff86b43905 szone_error + 580 
6 libsystem_c.dylib    0x00007fff86b43f7d free_large + 229 
7 libsystem_c.dylib    0x00007fff86b3b8f8 free + 199 
8 libBLAS.dylib     0x00007fff906b0431 __APL_dgemm_block_invoke_0 + 132 
9 libdispatch.dylib    0x00007fff89a65f01 _dispatch_call_block_and_release + 15 
10 libdispatch.dylib    0x00007fff89a620b6 _dispatch_client_callout + 8 
11 libdispatch.dylib    0x00007fff89a631fa _dispatch_worker_thread2 + 304 
12 libsystem_c.dylib    0x00007fff86b24d0b _pthread_wqthread + 404 
13 libsystem_c.dylib    0x00007fff86b0f1d1 start_wqthread + 13 

同样也会发生此:

def fork_test2(): 
    from sage.parallel.decorate import fork 
    test_ = fork(_fork_test_func, verbose=True) 
    test_() 

- 但前提是你之前使用一些其他的矩阵计算。


这个测试用例也适用于新的贤者会话:

def _fork_test_func(iterator=None): 
    if not iterator: 
     import itertools 
     iterator = itertools.count() 
    for i in iterator: 
     m = matrix(QQ, 100, [randrange(-100,100) for i in range(100*100)]) 
     m.right_kernel() 

def fork_test(): 
    _fork_test_func(range(10)) 
    import os 
    pid = os.fork() 
    if pid != 0: 
     print "parent, child: %i" % pid 
     os.waitpid(pid, 0) 
    else: 
     print "child" 
     try: 
      _fork_test_func() 
     finally: 
      os._exit(0) 

我下载了贤者5.8的MacOSX的64位二进制文​​件。

(请注意,我还问上ask.sagemath.org here

回答

1

这两个crashreports的表明,多线程程序fork()版,这极大地限制了一套可安全执行的操作孩子,你基本上只能在standard调用execve()等,与一些其他的功能,它沿着从async-signal-safe功能

这在fork(2)手册页的CAVEATS部分记录的清单,以及:

一个进程应该创建一个单线程。如果多线程进程调用fork(),则新进程应包含调用线程的副本及其整个地址空间,可能包括互斥锁和其他资源的状态。因此,为避免错误,子进程可能只会执行异步信号安全操作,直到调用其中一个exec函数为止。

由于在Mac OS X框架许多API会导致进程成为多线程的,如果你想叉孩子完全可用,你必须限制你的父进程操作fork之前记录的不的API使一个进程多线程(基本上只有POSIX API)。

+0

嗯,好吧,Sage和Python应该能够支持'fork'。如果不是,它需要修正某个地方的错误。我正在寻找它出错的地方。我正在寻找解决方法或修复方法。另外,Sage和Python不应该使用任何MacOSX API。 – Albert

+1

这两个回溯都会在libBLAS中显示一个框架,这绝对不是一个在多线程进程的fork之后可以安全使用的库。 – das

+0

那么需要什么才能使其安全? (请注意,我真的想让它成为可能,Sage应该支持这一点。) – Albert