2015-12-02 182 views
3

我有一个工作表,其中包含约8000行已被过滤。我试图从工作表列中删除重复项来获取值的集合。通过阅读这篇文章,有两种方法可以做到这一点。如果该值不在新集合中,则循环收集并复制到新集合。
或者将列中的数据复制到临时电子表格中,过滤并将数据复制到另一列,然后将其添加到集合中。Excel VBA删除重复vs筛选器

当处理大量数据时,复制过滤器具有最佳性能,但由于必须创建新工作表,因此该过滤器非常笨重。

我还没有看到它,但是有没有办法在内存中做复制过滤器,而不是创建一个工作表来做到这一点?

重述:

Sub GetColumnValues(Ws As Worksheet, Column As Long, CollValues As Collection) 

Dim RowIndex As Long 

    For RowIndex = 1 To GetLastRow(Ws) 
     If CollValues.Count = 0 Then 
      CollValues.Add (Ws.Cells(RowIndex, Column).Value) 
     Else 
      If IsInCollection(CollValues, Ws.Cells(RowIndex, Column).Value) = False Then 
       CollValues.Add (Ws.Cells(RowIndex, Column).Value) 
      End If 
     End If 
    Next RowIndex 

End Sub 

过滤器和复制:

Sub GetColumnValues(Ws As Worksheet, Column As Long, CollValues As Collection) 

Dim rowLast As Long 
Dim c As Range 
Dim tmpWS As Worksheet 
Dim tmpWsName As String 

    tmpWsName = "TempWS" 

    Call DeleteWs(TsWb, tmpWsName) 

    Set tmpWS = TsWb.Sheets.Add() 
    tmpWS.Name = tmpWsName 

    rowLast = GetLastRow(Ws) 

    Ws.Range(Ws.Cells(1, Column), Ws.Cells(rowLast, Column)).Copy 
    tmpWS.Range("A1").PasteSpecial 

    rowLast = GetLastRow(tmpWS) 
    tmpWS.Range(tmpWS.Cells(1, 1), tmpWS.Cells(rowLast, 1)).AdvancedFilter _ 
     Action:=xlFilterCopy, _ 
     CopyToRange:=tmpWS.Range("B1"), _ 
     Unique:=True 

    rowLast = GetLastRow(tmpWS) 

    For Each c In tmpWS.Range(tmpWS.Cells(1, 2), tmpWS.Cells(rowLast, 2)) 
     If Len(c.value) > 0 Then 
      CollValues.Add (c.value) 
     End If 
    Next c 

    Call DeleteWs(TsWb, tmpWsName) 
End Sub 

回答

0

我不知道为什么它必须是一个集合,但要获得快速的让所有值的数组无(过滤列表),双打,你可以做这样的:(非常接近你的第一个例子)

Function GetColVal(Ws As Worksheet, Column As Long) As Variant 
    Dim runner As Variant, outputVal() As Variant, i As Long 
    ReDim outputVal(Ws.Range(Ws.Cells(1, Column), Ws.Cells(GetLastRow(Ws), Column)).SpecialCells(xlCellTypeVisible).Count) 
    For Each runner In Ws.Range(Ws.Cells(1, Column), Ws.Cells(GetLastRow(Ws), Column)).SpecialCells(xlCellTypeVisible) 
    If i = 0 Then 
     outputVal(0) = runner.Value: i = 1 
    Else 
     If IsError(Application.Match(runner.Value, outputVal, 0)) Then outputVal(i) = runner.Value: i = i + 1 
    End If 
    Next 
    ReDim Preserve outputVal(i - 1) 
    GetColVal= outputVal 
End Function 

Application.Match是在VBA最快的功能之一,而IsInCollection可以是非常慢......更好的运行For Each ...循环到集合中添加的一切不是检查集合...

Dim a As Variant 
For Each a in GetColVal(Worksheets("SheetX"),7) 
    MyCollection.Add a 
Next 

应该比你的例子快得多...仍然我推荐不使用集合,特别是如果你只是使用值...如果可能的话,更好地使用GetColVal -array ...
variantVariable = GetColVal(Worksheets("SheetX"),7)然后使用变量变量你想干什么就干什么(你也可以粘贴在片直接某处)

一个简单的输出到工作表是这样的:

Dim a As Variant 
a = GetColVal(Worksheets("Sheet1"),13) 'values from sheet1 column M 
'pasting in one row (starting at A1 in Sheet2) 
ThisWorkbook.Sheets("Sheet2").Range(Cells(1, 1), Cells(1, ubound(a) + 1)).value = a 
'pasting in one column (starting at C5 in Sheet4) 
ThisWorkbook.Sheets("Sheet4").Range(Cells(5, 3), Cells(ubound(a) + 5, 3)).value = Application.Transpose(a) 

编辑

要显示不同的东西:

Function GetColumnValues(ws As Worksheet, Column As Long) As Range 
    With ws 
    Dim srcRng As Range, outRng As Range, runRng1 As Range, runRng2 As Range, dBool As Boolean 
    Set srcRng = .Range(.Cells(1, Column), .Cells(GetLastRow(ws), Column)).SpecialCells(xlCellTypeVisible) 
    For Each runRng1 In a 
     If outRng Is Nothing Then Set outRng = runRng1 
     For Each runRng2 In outRng 
     If Intersect(runRng1, runRng2) Is Nothing Then 
      If runRng2.Value = runRng1.Value Then dBool = True: Exit For 
     End If 
     Next 
     If dBool Then dBool = False Else Set outRng = Union(outRng, runRng1) 
    Next 
    End With 
    Set GetColumnValues = outRng 
End Function 

有了这个功能,你会得到一个范围内的所有可以选择或复制到另一个位置(与格式化和其他一切)的细胞。您仍然可以使用For Each ...将所有元素添加到集合中。我也没有使用Match来避免“Len> 255” - 错误

+0

收集的唯一原因是我从outputVal中删除重复项后删除了其他数据。使用集合而不是重新创建新数组更容易。我能想到的唯一方法是创建一个需要删除的单独索引数组,然后执行redim并一次复制到新数组。 虽然你已经给了我一些想法,但谢谢。 – SteveP65

+0

我仍在调查此问题。我发现的一个问题是,如果一个Variant数组中有一个字符串长于255个字符,则Application.Match会失败,并显示类型不匹配错误。如果我将输出数组更改为字符串数组,则可以处理整个数据范围。 – SteveP65

+0

@ SteveP65我添加了另一个功能,你可能想尝试(应该更快,但我无法确定)...显然你会得到一个范围,它可以保留一些你可能想要使用的好处:) –

0

是的,只是做阵列,然后检查对数组,然后提供您的结果返回到工作表。我个人喜欢在内存中而不是通过应用程序IDE做事情。

它快得多(特别是数万行),您不必担心屏幕刷新,或让用户想知道如何快速移动所有内容。我通常处理内存中的所有内容,将其交回,然后激活我希望用户看到的工作表。

dim set1Array() as String 
dim set2Array() as String 
dim set1Rows as Long 
dim set2Rows as Long 
dim lngX as Long 
dim lngY as Long 
dim blnDebug as Boolean; blnDebug = true ' flag for debugging 

' get count of rows so we know how big to make the arrays 
set1Rows = GetLastRow(Ws1) 
set2Rows = GetLastRow(Ws2) 

' set arrays to the proper size 
redim set1Rows(set1Rows - 1, 1)' 1 represents 2 columns since it's 0 based. the second column is a flag for duplicated. 
redim set2Rows(set2Rows - 1, 0)' 0 represents 1 column since it's 0 based 

' load the arrays with the sheet data 
for lngX = 1 to set1Rows 
    set1Rows(lngX - 1, 0) = Worksheets("Sheet1").range("A" & lngX).Text 
next lngX 

for lngX = 1 to set2Rows 
    set2Rows(lngX - 1, 0) = Worksheets("Sheet2").range("A" & lngX).Text 
next lngX 

' I like to do a debug callout here to see what I got to make sure that I am good to go with the dataset 
if blnDebug then 
    for lngX = 0 to Ubound(set1Rows) 
    debug.print "set1Rows(" & lngX & ") - col1: " & set1Rows(lngX, 0) 
    next lngX 

    for lngX = 0 to Ubound(set2Rows) 
    debug.print "set2Rows(" & lngX & ") - col1: " & set2Rows(lngX, 0) 
    next lngX 

end if 

' now do your comparison 

for lngX = 0 to Ubound(set1Rows) 
    for lngY = 0 to Ubound(set2Rows) 
    if set1Rows(lngX, 0) = set2Rows(lngY, 0) then 
     set1Rows(lngX, 1) = "1" 
    end if 
    next lngY 
next lngX 

' now your duplicates are flagged in the set1Rows array 

for lngX = 0 to Ubound(set1Rows) 
    if set1Rows(lngX, 1) = "1" then 
    ' code for duplicated 
    else 
    ' code for unique 
    end if 
next lngX 
+0

正在对一列数据进行比较。所以理想情况下,我想将列复制到数组中,在该数组中删除任何重复的值。然后我会使用该数组进行额外的处理。 – SteveP65