批评我的并发队列

这是我编写的并发队列，我打算在我写的线程池中使用。我想知道是否有任何性能改进。如果您好奇，atomic_counter被粘贴在下面！批评我的并发队列

#ifndef NS_CONCURRENT_QUEUE_HPP_INCLUDED 
#define NS_CONCURRENT_QUEUE_HPP_INCLUDED 

#include <ns/atomic_counter.hpp> 
#include <boost/noncopyable.hpp> 
#include <boost/smart_ptr/detail/spinlock.hpp> 
#include <cassert> 
#include <cstddef> 

namespace ns { 
    template<typename T, 
      typename mutex_type = boost::detail::spinlock, 
      typename scoped_lock_type = typename mutex_type::scoped_lock> 
    class concurrent_queue : boost::noncopyable { 
     struct node { 
      node * link; 
      T const value; 
      explicit node(T const & source) : link(0), value(source) { } 
     }; 
     node * m_front; 
     node * m_back; 
     atomic_counter m_counter; 
     mutex_type m_mutex; 
    public: 
     // types 
     typedef T value_type; 

     // construction 
     concurrent_queue() : m_front(0), m_mutex() { } 
     ~concurrent_queue() { clear(); } 

     // capacity 
     std::size_t size() const { return m_counter; } 
     bool empty() const { return (m_counter == 0); } 

     // modifiers 
     void push(T const & source); 
     bool try_pop(T & destination); 
     void clear(); 
    }; 

    template<typename T, typename mutex_type, typename scoped_lock_type> 
    void concurrent_queue<T, mutex_type, scoped_lock_type>::push(T const & source) { 
     node * hold = new node(source); 
     scoped_lock_type lock(m_mutex); 
     if (empty()) 
      m_front = hold; 
     else 
      m_back->link = hold; 
     m_back = hold; 
     ++m_counter; 
    } 

    template<typename T, typename mutex_type, typename scoped_lock_type> 
    bool concurrent_queue<T, mutex_type, scoped_lock_type>::try_pop(T & destination) { 
     node const * hold; 
     { 
      scoped_lock_type lock(m_mutex); 
      if (empty()) 
       return false; 
      hold = m_front; 
      if (m_front == m_back) 
       m_front = m_back = 0; 
      else 
       m_front = m_front->link; 
      --m_counter; 
     } 
     destination = hold->value; 
     delete hold; 
     return true; 
    } 

    template<typename T, typename mutex_type, typename scoped_lock_type> 
    void concurrent_queue<T, mutex_type, scoped_lock_type>::clear() { 
     node * hold; 
     { 
      scoped_lock_type lock(m_mutex); 
      hold = m_front; 
      m_front = 0; 
      m_back = 0; 
      m_counter = 0; 
     } 
     if (hold == 0) 
      return; 
     node * it; 
     while (hold != 0) { 
      it = hold; 
      hold = hold->link; 
      delete it; 
     } 
    } 
} 

#endif

atomic_counter.hpp

#ifndef NS_ATOMIC_COUNTER_HPP_INCLUDED 
#define NS_ATOMIC_COUNTER_HPP_INCLUDED 

#include <boost/interprocess/detail/atomic.hpp> 
#include <boost/noncopyable.hpp> 

namespace ns { 
    class atomic_counter : boost::noncopyable { 
     volatile boost::uint32_t m_count; 
    public: 
     explicit atomic_counter(boost::uint32_t value = 0) : m_count(value) { } 

     operator boost::uint32_t() const { 
      return boost::interprocess::detail::atomic_read32(const_cast<volatile boost::uint32_t *>(&m_count)); 
     } 

     void operator=(boost::uint32_t value) { 
      boost::interprocess::detail::atomic_write32(&m_count, value); 
     } 

     void operator++() { 
      boost::interprocess::detail::atomic_inc32(&m_count); 
     } 

     void operator--() { 
      boost::interprocess::detail::atomic_dec32(&m_count); 
     } 
    }; 
} 

#endif

来源

2010-07-07 Nick Strupat

您有具体的问题要问，还是需要解决的地方？ SO不是代码评论网站。 – 2010-07-07 05:08:12

我做了编辑，表明我对提高性能的建议感兴趣。 – 2010-07-07 05:13:33

你见过香草萨特的DDJ文章（http://www.drdobbs.com/cpp/211601363）吗？他讨论了一些性能改进（对push和pop使用不同的锁，对缓存行大小使用padding成员变量）。更一般地说，为什么你实现自己的而不是使用，例如TBB的'concurrent_queue'？ – stephan 2010-07-07 07:05:03

我想你会遇到性能问题，在这种情况下，因为调用new为每个新节点的链接列表。这不仅仅是因为调用动态内存分配器很慢。这是因为调用它经常引入很多并发开销，因为免费存储必须在多线程环境中保持一致。

我会使用一个你调整大小的矢量，因为它太小而无法保持队列。我永远不会调整它更小。

我会安排正面和背面的值，所以向量是一个环形缓冲区。这将需要您在调整大小时移动元素。但这应该是一个相当罕见的事件，并且可以通过在施工时提供建议的矢量大小在一定程度上减轻。

或者，您可以保留链接列表结构，但不要销毁节点。只需将其添加到空闲节点的队列中即可。不幸的是，免费节点的队列需要锁定才能正常管理，而且我不确定你是否真的处于比删除更新更好的状态，并且始终处于新状态。

您还将获得更好的向量参考位置。但我不太肯定这将如何与高速缓存线交互在CPU之间来回传递。

其他一些人建议::std::deque，我不认为这是一个坏主意，但我怀疑环形缓冲区向量是一个更好的主意。

来源

2010-07-07 05:42:15 Omnifarious

我认为环形缓冲区不需要被设计为支持可变容量。就像Windows消息队列一样，它的容量也是固定的。我很少看到人们抱怨这件事。 – albert 2012-07-12 05:17:46

香草萨特提出了一个无锁队列的实现，肯定会超越你的:)

主要的想法是使用一个缓冲圈，队列的运行过程中完全放弃的内存动态分配。这意味着队列可能已满（因此您可能需要等待放入元素），这在您的情况下可能不可接受。

正如Omnifarious指出的那样，最好不要使用链表（用于缓存局部性），除非您为池分配内存。我会尝试使用std::deque作为后端，它对内存更友好，并且保证只要您正常情况下只在队列中弹出和推送（在前端和后端），就不会有任何重新分配。

来源

2010-07-07 08:15:52

我不知道赫伯特萨特的想法。 :-)我不得不看看细节，但我敢打赌，你可以实现同样的扩展（但从来没有收缩）缓冲区，我建议在它之上。 – Omnifarious 2010-07-07 12:43:06

也许，请看Cliff博士的这个Google Talk点击实现Java的无锁哈希表。核心思想当然是一个动态数组：http://video.google.com/videoplay?docid=2139967204534450862#，有什么好处是重新分配的成本分散在多个写入中，所以当你不冻结一个线程时需要重新分配kick in。但是你仍然需要记忆，这总是需要一些时间。 – 2010-07-07 13:03:15

只是一个简单的说明，Herb没有将环作为通用并发队列，而是一个简单的缓冲环实现仅适用于单个读写器，单写例。如果你想要多个生产者/消费者，问题变得更复杂的循环缓冲区。 – creatiwit 2012-03-16 20:06:42

批评我的并发队列

回答

相关问题