malloc的分段故障

这里是一段代码，其中段故障发生时（在PERROR不被调用）：malloc的分段故障

job = malloc(sizeof(task_t)); 
if(job == NULL) 
    perror("malloc");

为了更精确，GDB说，segfault发生一个__int_malloc呼叫，在其内部是由malloc进行的子例程调用。

由于malloc函数与其他线程并行调用，最初我认为它可能是问题。我使用glibc的版本2.19。

的数据结构：

typedef struct rv_thread thread_wrapper_t; 

typedef struct future 
{ 
    pthread_cond_t wait; 
    pthread_mutex_t mutex; 
    long completed; 
} future_t; 

typedef struct task 
{ 
    future_t * f; 
    void * data; 
    void * 
    (*fun)(thread_wrapper_t *, void *); 
} task_t; 

typedef struct 
{ 
    queue_t * queue; 
} pool_worker_t; 

typedef struct 
{ 
    task_t * t; 
} sfuture_t; 

struct rv_thread 
{ 
    pool_worker_t * pool; 
};

现在在今后实现：

future_t * 
create_future() 
{ 
    future_t * new_f = malloc(sizeof(future_t)); 
    if(new_f == NULL) 
    perror("malloc"); 
    new_f->completed = 0; 
    pthread_mutex_init(&(new_f->mutex), NULL); 
    pthread_cond_init(&(new_f->wait), NULL); 
    return new_f; 
} 

int 
wait_future(future_t * f) 
{ 
    pthread_mutex_lock(&(f->mutex)); 
    while (!f->completed) 
    { 
     pthread_cond_wait(&(f->wait),&(f->mutex)); 
    } 
    pthread_mutex_unlock(&(f->mutex)); 
    return 0; 
} 

void 
complete(future_t * f) 
{ 
    pthread_mutex_lock(&(f->mutex)); 
    f->completed = 1; 
    pthread_mutex_unlock(&(f->mutex)); 
    pthread_cond_broadcast(&(f->wait)); 
}

线程池本身：

pool_worker_t * 
create_work_pool(int threads) 
{ 
    pool_worker_t * new_p = malloc(sizeof(pool_worker_t)); 
    if(new_p == NULL) 
    perror("malloc"); 
    threads = 1; 
    new_p->queue = create_queue(); 
    int i; 
    for (i = 0; i < threads; i++){ 
    thread_wrapper_t * w = malloc(sizeof(thread_wrapper_t)); 
    if(w == NULL) 
     perror("malloc"); 
    w->pool = new_p; 
    pthread_t n; 
    pthread_create(&n, NULL, work, w); 
    } 
    return new_p; 
} 

task_t * 
try_get_new_task(thread_wrapper_t * thr) 
{ 
    task_t * t = NULL; 
    try_dequeue(thr->pool->queue, t); 
    return t; 
} 

void 
submit_job(pool_worker_t * p, task_t * t) 
{ 
    enqueue(p->queue, t); 
} 

void * 
work(void * data) 
{ 
    thread_wrapper_t * thr = (thread_wrapper_t *) data; 
    while (1){ 
    task_t * t = NULL; 
    while ((t = (task_t *) try_get_new_task(thr)) == NULL); 
    future_t * f = t->f; 
    (*(t->fun))(thr,t->data); 
    complete(f); 
    } 
    pthread_exit(NULL); 
}

最后的task.c：

pool_worker_t * 
create_tpool() 
{ 
    return (create_work_pool(8)); 
} 

sfuture_t * 
async(pool_worker_t * p, thread_wrapper_t * thr, void * 
(*fun)(thread_wrapper_t *, void *), void * data) 
{ 
    task_t * job = NULL; 
    job = malloc(sizeof(task_t)); 
    if(job == NULL) 
    perror("malloc"); 
    job->data = data; 
    job->fun = fun; 
    job->f = create_future(); 
    submit_job(p, job); 
    sfuture_t * new_t = malloc(sizeof(sfuture_t)); 
    if(new_t == NULL) 
    perror("malloc"); 
    new_t->t = job; 
    return (new_t); 
} 

void 
mywait(thread_wrapper_t * thr, sfuture_t * sf) 
{ 
    if (sf == NULL) 
    return; 
    if (thr != NULL) 
    { 
     while (!sf->t->f->completed) 
     { 
      task_t * t_n = try_get_new_task(thr); 
      if (t_n != NULL) 
      { 
      future_t * f = t_n->f; 
      (*(t_n->fun))(thr,t_n->data); 
      complete(f); 
      } 
     } 
     return; 
    } 
    wait_future(sf->t->f); 
    return ; 
}

该队列是lfds无锁队列。

#define enqueue(q,t) {         \ 
    if(!lfds611_queue_enqueue(q->lq, t))    \ 
     {            \ 
     lfds611_queue_guaranteed_enqueue(q->lq, t); \ 
     }            \ 
    } 

#define try_dequeue(q,t) {       \ 
    lfds611_queue_dequeue(q->lq, &t);    \ 
    }

无论何时调用异步的次数非常高，都会出现问题。

Valgrind的输出：

Process terminating with default action of signal 11 (SIGSEGV) 
==12022== Bad permissions for mapped region at address 0x5AF9FF8 
==12022== at 0x4C28737: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

来源

2014-02-26 guilhermemtr

是否有其他可能会使'malloc'的记账混乱？ – cnicutar

这听起来像内存在其他地方被损坏。 – imreal

这是唯一的解释，我会发布整个代码。（这真的是一个最小的模型，有内存泄漏等）。 – guilhermemtr

我已经想通了什么问题：堆栈溢出。首先，让我解释为什么堆栈溢出发生在malloc内部（这可能是你阅读本文的原因）。当我的程序运行时，每次开始执行（递归）另一个任务时（由于我编写程序的方式），堆栈大小不断增加。但是对于每个这样的时间，我不得不使用malloc分配一个新任务。但是，malloc会进行其他子例程调用，这使得堆栈的大小比执行另一个任务的简单调用更大。所以，发生的事情是，即使没有malloc，我会得到一个堆栈溢出。然而，因为我有malloc，堆栈溢出的时刻在malloc中，在它通过进行另一个递归调用溢出之前。插图波纹管显示发生了什么事：

初始堆栈状态：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
|  malloc   | 
------------------------- 
|  __int_malloc  | <- If the stack passes this point, the stack overflows. 
-------------------------

然后叠再萎缩，和我的代码进入了一个新的递归调用：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
|  garbage  | 
------------------------- 
|  garbage  | <- If the stack passes this point, the stack overflows. 
-------------------------

malloc调用期间堆栈：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
| recursive call n  | 
------------------------- 
|  garbage  | <- If the stack passes this point, the stack overflows. 
-------------------------

然后，它再次在t中调用malloc他的新递归电话。然而，这一次它溢出：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
| recursive call n  | 
------------------------- 
|  malloc   | <- If the stack passes this point, the stack overflows. 
------------------------- 
|  __int_malloc  | <- This is when the stack overflow occurs. 
-------------------------

[答案的其余部分更集中于为什么我在我的特别代码这个问题]

通常，递归计算斐波那契时，例如，在某个数n的情况下，堆栈大小随着该数字线性增长。但是，在这种情况下，我正在创建任务，使用队列来存储它们，并将一个（fib）任务移出执行。如果你在纸上画这个，你会发现任务的数量随着n而呈指数增长，而不是线性增加（还要注意，如果我使用堆栈来存储创建任务时的任务数，以及栈的大小只会随着n的增长而线性增长，所以会发生堆栈随着n成指数增长，导致堆栈溢出......现在是为什么这个溢出发生在调用malloc内部的部分，所以基本上我在上面解释过，堆栈溢出发生在malloc调用中，因为它是堆栈最大的地方，发生的事情是堆栈几乎爆炸，并且由于malloc调用它的内部函数，堆栈的增长不仅仅是调用mywait， fib。

谢谢大家！如果这不是你的帮助，我将无法想象它！

来源

2014-02-27 11:43:53 guilhermemtr

我应该将自己的答案标记为正确吗？ – guilhermemtr

这就是我猜测的，因为我找不到任何问题。但为了确保这是问题，您可以将文件的“顶部”输出转储并检查内存使用情况如何增加？答案和问题+1。 – Jekyll

当我删除所有线程时，valgrind说这可能是堆栈溢出，尽管这不太可能。我将ulimit设置得更大，然后我可以运行更大的fib数量。当我复制堆栈大小时，我只能将1添加到前一个数字。但我会照你说的去做，只是为了证实 – guilhermemtr

甲SIGSEGV（分段故障）中的malloc在烧成通常是由堆损坏引起的。堆损坏不会导致分段错误，所以只有当malloc尝试访问时才会看到该错误。问题是，创建堆损坏的代码可能在距离调用malloc的任何地方都很远。它通常是malloc中的下一个块指针，它由堆损坏更改为无效地址，因此，当您调用malloc时，无效指针会被解除引用并出现段错误。

我想你可能会尝试从程序的其余部分中分离出部分代码，以减少错误的可见性。

此外，我看到你永远不会释放这里的内存，并且可能有内存泄漏。

为了检查内存泄露，你可以运行top命令top -b -n 1检查：

RPRVT - resident private address space size 
RSHRD - resident shared address space size 
RSIZE - resident memory size 
VPRVT - private address space size 
VSIZE - total memory size

来源

2014-02-26 20:48:48 Jekyll

问题是分段错误只发生在很多调用之后。 – guilhermemtr

你有没有看到是否有内存泄漏？我在这里没有看到任何空闲的......你有空吗？ – Jekyll

我会遇到一个问题，如果我不迟早释放内存......因为这个程序只在这里分配... – Jekyll

malloc的分段故障

回答

相关问题