如何让矢量元素独特？（删除不相邻的重复项）

举一个简单的例子，考虑：

2 1 6 1 4 6 2 1 1

我试图通过删除不相邻的重复和维护元素的顺序，使这个vector独特。

结果将是：

2 1 6 4

我尝试的解决方案是：

将插入到一个std ::设置，但这种方法的问题是它会干扰元素的顺序。
使用std :: sort和std :: unique的组合。但同样的顺序问题。

手动消除重复：

Define a temporary vector TempVector. 
    for (each element in a vector) 
    { 
     if (the element does not exists in TempVector) 
     { 
      add to TempVector; 
     } 
    } 
    swap orginial vector with TempVector.

我的问题是：

是否有任何STL算法可以从载体中删除不相邻的重复？它的复杂性是什么？

来源

2009-09-21 aJ.

由于从元素构造一个集合需要花费O（n log n），所以不可能构造一个集合并且保持所看到的顺序。所以，选择一个O（n log n）然后去。 – 2012-06-19 11:35:02

我想你会做这样的：

我会在矢量使用两个迭代器：

第一个的读取数据，并将其插入一个临时组。

当读取的数据不在集合中时，将其从第一个迭代器复制到第二个迭代器并将其增加。

最后你只保留数据直到第二个迭代器。

复杂度为O（n .log（n）），因为重复元素的查找使用集合而不是矢量。

#include <vector> 
#include <set> 
#include <iostream> 

int main(int argc, char* argv[]) 
{ 
    std::vector<int> k ; 

    k.push_back(2); 
    k.push_back(1); 
    k.push_back(6); 
    k.push_back(1); 
    k.push_back(4); 
    k.push_back(6); 
    k.push_back(2); 
    k.push_back(1); 
    k.push_back(1); 

{ 
    std::vector<int>::iterator r , w ; 

    std::set<int> tmpset ; 

    for(r = k.begin() , w = k.begin() ; r != k.end() ; ++r) 
    { 
     if(tmpset.insert(*r).second) 
     { 
      *w++ = *r ; 
     } 
    } 

    k.erase(w , k.end()); 
} 


    { 
     std::vector<int>::iterator r ; 

     for(r = k.begin() ; r != k.end() ; ++r) 
     { 
      std::cout << *r << std::endl ; 
     } 
    } 
}

来源

2009-09-21 08:14:38

使用'find'然后'insert'效率不高。如果值已插入，则'tmpset.insert（* r）.second'将为true;如果该值已在集合中，则为false。 – cjm 2009-09-21 08:47:04

我忘记了，我改正了 – 2009-09-21 08:48:55

这是关于时间，我们被允许写'std :: vector k（{1,2,3}）;'...... – xtofl 2009-09-21 08:51:10

My question is:

Is there any STL algorithm which can remove the non-adjacent duplicates from the vector ? what is its complexity?

STL的选项是你提到的那些：把项目的std::set，或致电std::sort，std::unique并呼吁在容器上erase()。这些都不符合您的要求，即“删除不相邻的副本并维护元素的顺序”。

那么为什么STL不提供其他选择？没有标准的图书馆会为每个用户的需求提供一切。 STL的设计考虑因素包括“对几乎所有用户都足够快”，“对几乎所有用户都有用”，“尽可能地提供例外安全性”（以及“对标准委员会来说足够小”写的很大，Stroustrup剔除了2/3的东西）。

我能想到的最简单的办法是这样的：

// Note: an STL-like method would be templatized and use iterators instead of 
// hardcoding std::vector<int> 
std::vector<int> stable_unique(const std::vector<int>& input) 
{ 
    std::vector<int> result; 
    result.reserve(input.size()); 
    for (std::vector<int>::iterator itor = input.begin(); 
            itor != input.end(); 
            ++itor) 
     if (std::find(result.begin(), result.end(), *itor) == result.end()) 
      result.push_back(*itor); 
     return result; 
}

这个解决方案应该是O（M * N），其中M是唯一的元素数，N为元素的总数。

来源

2009-09-21 08:42:34

砍伐是一项协同努力:) STL令人印象深刻，但它并不是一个简单的复制和粘贴工作，以达到标准。只要看看获得Boost的东西所需的努力，那个项目就是为了让东西进入标准而设计的。因此，“砍伐”只是写出最重要的1/3的完整规格的副作用。 – MSalters 2009-09-21 09:51:47

没有STL算法做你想要保留序列的原始顺序。

您可以在向量中创建一个迭代器或索引的std::set，比较谓词使用引用的数据而不是迭代器/索引来排序内容。然后，您从集合中未引用的矢量中删除所有内容。（当然，你也可以同样使用其他std::vector迭代器/索引，std::sort和std::unique的是，并以此作为参考，以什么来保持。）

来源

2009-09-21 08:44:55 sbi

据我知道有没有在STL。看看reference。

来源

2009-09-21 08:47:12 Yelonek

您可以删除一些循环使用remove_copy_iffa's答案：

class NotSeen : public std::unary_function <int, bool> 
{ 
public: 
    NotSeen (std::set<int> & seen) : m_seen (seen) { } 

    bool operator()(int i) const { 
    return (m_seen.insert (i).second); 
    } 

private: 
    std::set<int> & m_seen; 
}; 

void removeDups (std::vector<int> const & iv, std::vector<int> & ov) { 
    std::set<int> seen; 
    std::remove_copy_if (iv.begin() 
     , iv.end() 
     , std::back_inserter (ov) 
     , NotSeen (seen)); 
}

这没有对算法的复杂性影响（即书面它也是为O（n log n）的）。你可以使用unordered_set来改进，或者如果你的值的范围足够小，你可以简单地使用一个数组或一个bitarray。

来源

2009-09-21 10:34:46

U_STD是从确定知道std :: namespace将被调用之前的宏吗？ – 2009-09-21 13:18:25

@onebyone：差不多！这是一个宏，我们真的不需要再使用它并返回到使用旧的g ++编译器（<= 2.95.3）的时候。习惯的力量让我无处不在！ – 2009-09-23 07:55:51

正确使用''。然而（这同样适用于@RichardCorden和@fa。的答案：除非OP有*非常大的数据集有很多独特的条目，'set'可能比使用输出'vector'慢由于更好的缓存行为（对于像[btree_set]（https://code.google.com/p/cpp-btree/）这样的大型数据集）对于小数据集可能更好，）。 – Joe 2013-06-21 21:06:50

，不使用临时set这是可能的（可能）要做到这一点的表现有些茫然：

template<class Iterator> 
Iterator Unique(Iterator first, Iterator last) 
{ 
    while (first != last) 
    { 
     Iterator next(first); 
     last = std::remove(++next, last, *first); 
     first = next; 
    } 

    return last; 
}

用作：

vec.erase(Unique(vec.begin(), vec.end()), vec.end());

对于较小的数据集，实施简单和缺少额外的分配可能会抵消使用额外的set的理论上更高的复杂性。尽管如此，使用具有代表性的输入进行测量是确保唯一的方法。

来源

2009-09-23 08:23:18

优秀的解决方案。我建议名称Unique是不是一个显示，因为我想。怎么样RemoveNonUnique？ – 2009-09-23 08:50:46

John Torjo有一篇很好的文章，它以系统的方式处理这个问题。他出现的结果似乎更通用，更高效比任何解决方案，至今这里建议：

http://www.builderau.com.au/program/java/soa/C-Removing-duplicates-from-a-range/0,339024620,320271583,00.htm

https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1052159.html

不幸的是，约翰的解决方案的完整的代码似乎不再可用，约翰没有回应可能发送电子邮件。因此，我写了我自己的代码，基于类似他的理由，但在一些细节上故意有所不同。如果您愿意，请随时与我联系（vschoech think-cell com）并讨论细节。

为了让代码为你编译，我添加了一些我自己定期使用的库东西。另外，我不用简单的stl，而是使用boost来创建更通用，更高效，更易读的代码。

玩得开心！

#include <vector> 
#include <functional> 

#include <boost/bind.hpp> 
#include <boost/range.hpp> 
#include <boost/iterator/counting_iterator.hpp> 

///////////////////////////////////////////////////////////////////////////////////////////// 
// library stuff 

template< class Rng, class Func > 
Func for_each(Rng& rng, Func f) { 
    return std::for_each(boost::begin(rng), boost::end(rng), f); 
}; 

template< class Rng, class Pred > 
Rng& sort(Rng& rng, Pred pred) { 
    std::sort(boost::begin(rng), boost::end(rng), pred); 
    return rng; // to allow function chaining, similar to operator+= et al. 
} 

template< class T > 
boost::iterator_range< boost::counting_iterator<T> > make_counting_range(T const& tBegin, T const& tEnd) { 
    return boost::iterator_range< boost::counting_iterator<T> >(tBegin, tEnd); 
} 

template< class Func > 
class compare_less_impl { 
private: 
    Func m_func; 
public: 
    typedef bool result_type; 
    compare_less_impl(Func func) 
    : m_func(func) 
    {} 
    template< class T1, class T2 > bool operator()(T1 const& tLeft, T2 const& tRight) const { 
     return m_func(tLeft) < m_func(tRight); 
    } 
}; 

template< class Func > 
compare_less_impl<Func> compare_less(Func func) { 
    return compare_less_impl<Func>(func); 
} 


///////////////////////////////////////////////////////////////////////////////////////////// 
// stable_unique 

template<class forward_iterator, class predicate_type> 
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd, predicate_type predLess) { 
    typedef std::iterator_traits<forward_iterator>::difference_type index_type; 
    struct SIteratorIndex { 
     SIteratorIndex(forward_iterator itValue, index_type idx) : m_itValue(itValue), m_idx(idx) {} 
     std::iterator_traits<forward_iterator>::reference Value() const {return *m_itValue;} 
     index_type m_idx; 
    private: 
     forward_iterator m_itValue; 
    }; 

    // {1} create array of values (represented by iterators) and indices 
    std::vector<SIteratorIndex> vecitidx; 
    vecitidx.reserve(std::distance(itBegin, itEnd)); 
    struct FPushBackIteratorIndex { 
     FPushBackIteratorIndex(std::vector<SIteratorIndex>& vecitidx) : m_vecitidx(vecitidx) {} 
     void operator()(forward_iterator itValue) const { 
      m_vecitidx.push_back(SIteratorIndex(itValue, m_vecitidx.size())); 
     } 
    private: 
     std::vector<SIteratorIndex>& m_vecitidx; 
    }; 
    for_each(make_counting_range(itBegin, itEnd), FPushBackIteratorIndex(vecitidx)); 

    // {2} sort by underlying value 
    struct FStableCompareByValue { 
     FStableCompareByValue(predicate_type predLess) : m_predLess(predLess) {} 
     bool operator()(SIteratorIndex const& itidxA, SIteratorIndex const& itidxB) { 
      return m_predLess(itidxA.Value(), itidxB.Value()) 
       // stable sort order, index is secondary criterion 
       || !m_predLess(itidxB.Value(), itidxA.Value()) && itidxA.m_idx < itidxB.m_idx; 
     } 
    private: 
     predicate_type m_predLess; 
    }; 
    sort(vecitidx, FStableCompareByValue(predLess)); 

    // {3} apply std::unique to the sorted vector, removing duplicate values 
    vecitidx.erase(
     std::unique(vecitidx.begin(), vecitidx.end(), 
      !boost::bind(predLess, 
       // redundand boost::mem_fn required to compile 
       boost::bind(boost::mem_fn(&SIteratorIndex::Value), _1), 
       boost::bind(boost::mem_fn(&SIteratorIndex::Value), _2) 
      ) 
     ), 
     vecitidx.end() 
    ); 

    // {4} re-sort by index to match original order 
    sort(vecitidx, compare_less(boost::mem_fn(&SIteratorIndex::m_idx))); 

    // {5} keep only those values in the original range that were not removed by std::unique 
    std::vector<SIteratorIndex>::iterator ititidx = vecitidx.begin(); 
    forward_iterator itSrc = itBegin; 
    index_type idx = 0; 
    for(;;) { 
     if(ititidx==vecitidx.end()) { 
      // {6} return end of unique range 
      return itSrc; 
     } 
     if(idx!=ititidx->m_idx) { 
      // original range must be modified 
      break; 
     } 
     ++ititidx; 
     ++idx; 
     ++itSrc; 
    } 
    forward_iterator itDst = itSrc; 
    do { 
     ++idx; 
     ++itSrc; 
     // while there are still items in vecitidx, there must also be corresponding items in the original range 
     if(idx==ititidx->m_idx) { 
      std::swap(*itDst, *itSrc); // C++0x move 
      ++ititidx; 
      ++itDst; 
     } 
    } while(ititidx!=vecitidx.end()); 

    // {6} return end of unique range 
    return itDst; 
} 

template<class forward_iterator> 
forward_iterator stable_unique(forward_iterator itBegin, forward_iterator itEnd) { 
    return stable_unique(itBegin, itEnd, std::less< std::iterator_traits<forward_iterator>::value_type >()); 
} 

void stable_unique_test() { 
    std::vector<int> vecn; 
    vecn.push_back(1); 
    vecn.push_back(17); 
    vecn.push_back(-100); 
    vecn.push_back(17); 
    vecn.push_back(1); 
    vecn.push_back(17); 
    vecn.push_back(53); 
    vecn.erase(stable_unique(vecn.begin(), vecn.end()), vecn.end()); 
    // result: 1, 17, -100, 53 
}

来源

2010-02-12 09:34:05 vschoech

有趣的但...排序意味着O（N *日志N），而O（N）解决方案基于哈希Set/Bloom Filters exists。 – 2010-08-30 14:06:59

由于问题是“是否有任何STL算法...？它的复杂性是什么？“这是有道理的实现像std::unique功能：

template <class FwdIterator> 
inline FwdIterator stable_unique(FwdIterator first, FwdIterator last) 
{ 
    FwdIterator result = first; 
    std::unordered_set<typename FwdIterator::value_type> seen; 

    for (; first != last; ++first) 
     if (seen.insert(*first).second) 
      *result++ = *first; 
    return result; 
}

因此，这是怎么std::unique实施加上一组额外的unordered_set应比普通set更快的所有元素都删除了比较相等的元素。（第一个元素保留，因为我们不能统一到无）迭代器返回指向[first,last)范围内的新结点

编辑：最后一句意思是容器本身不被unique修改。这可能令人困惑。下面的例子是实际的将容器减少到统一集合。

1: std::vector<int> v(3, 5); 
2: v.resize(std::distance(v.begin(), unique(v.begin(), v.end()))); 
3: assert(v.size() == 1);

第1行创建矢量{ 5, 5, 5 }。在第二行中调用unique将第二个元素的迭代器返回，这是第一个不唯一的元素。因此distance返回1并且resize修剪向量。

来源

2012-05-09 16:29:18

好的解决方案使用C++ 11，并在第5行添加typename关键字。 – 2015-03-24 02:12:37

是的，应该提到'std :: unordered_set'是C++ 11。如果不可用（即'__cplusplus <201103L'）只需使用'std :: set'。 – 2015-07-18 07:19:59

基础上@戈登的回答，但使用lambda表达式，而是将删除基于@fa的答案输出向量

set<int> s; 
    vector<int> nodups; 
    remove_copy_if(v.begin(), v.end(), back_inserter(nodups), 
     [&s](int x){ 
      return !s.insert(x).second; //-> .second false means already exists 
     });

来源

2013-06-21 20:53:11 winterlight

写他们的副本。它也可以开始使用STL算法std::stable_partition改写：

struct dupChecker_ { 
    inline dupChecker_() : tmpSet() {} 
    inline bool operator()(int i) { 
     return tmpSet.insert(i).second; 
    } 
private: 
    std::set<int> tmpSet; 
}; 

k.erase(std::stable_partition(k.begin(), k.end(), dupChecker_()), k.end());

这样，它是更紧凑，我们并不需要关心的迭代器。

它似乎甚至没有引入太多的性能损失。我在我的项目中使用它，需要经常处理非常大的载体，它并没有真正的区别。

另一个不错的特性是，可以通过使用std::set<int, myCmp_> tmpSet;来调整唯一性。例如，在我的项目中忽略某些舍入错误。

来源

2013-09-24 14:41:45

鉴于你的输入是vector<int> foo可以使用remove在这里做腿部的工作适合你，那么如果你要收缩的载体只是用erase否则只使用last作为一个过去的最末端迭代器时你想删除重复，但为了矢量保留：

auto last = end(foo); 

for(auto first = begin(foo); first < last; ++first) last = remove(next(first), last, *first); 

foo.erase(last, end(foo));

Live Example

至于时间复杂度，这将是O（nm）的。其中n是元素的个数和m是唯一元素的个数。就空间的复杂性而言，这将使用O（n），因为它会在适当的位置进行移除。

来源

2017-08-11 16:55:44

如何让矢量元素独特？ （删除不相邻的重复项）

回答

相关问题

如何让矢量元素独特？（删除不相邻的重复项）