2017-08-07 89 views
3

我有这个代码,我仍然无法用简单的“占位符”从我的数据中替换非英文字符,如越南语或泰语。如何编写一个vba代码来删除和替换UTF8-字符

Sub NonLatin() 
Dim cell As Range 
    For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp)) 
     s = cell.Value 
      For i = 1 To Len(s) 
       If Mid(s, i, 1) Like "[[email protected]#$%^&* * ]" Then cell.Value = "placeholder" 
      Next 
    Next 
End Sub 

感谢您的帮助

+0

也不需要你和我的一个单元格后你的NEXT语句? – Luuklag

+0

看看使用'RegEx'代替 – Tom

+1

@Luuklag你不必*在'Next'语句之后包含计数器变量,这只是一个很好的做法,因为它提高了可读性。看到[这个问题](https://stackoverflow.com/questions/21993482/vba-why-do-people-include-the-variables-name-in-a-next-statement) – Wolfie

回答

0

有关在VBA代码中使用正则表达式详见this question


然后在像这样的函数中使用正则表达式来处理字符串。在这里,我假设你想用占位符替换每个无效的字符,而不是整个字符串。如果是整个字符串,则不需要进行单独的字符检查,只需在规则表达式的模式中将+*限定符用于多个字符,并将整个字符串一起测试即可。

Function LatinString(str As String) As String 
    ' After including a reference to "Microsoft VBScript Regular Expressions 5.5" 
    ' Set up the regular expressions object 
    Dim regEx As New RegExp 
    With regEx 
     .Global = True 
     .MultiLine = True 
     .IgnoreCase = False 
     ' This is the pattern of ALLOWED characters. 
     ' Note that special characters should be escaped using a slash e.g. \$ not $ 
     .Pattern = "[A-Za-z0-9]" 
    End With 

    ' Loop through characters in string. Replace disallowed characters with "?" 
    Dim i As Long 
    For i = 1 To Len(str) 
     If Not regEx.Test(Mid(str, i, 1)) Then 
      str = Left(str, i - 1) & "?" & Mid(str, i + 1) 
     End If 
    Next i 
    ' Return output 
    LatinString = str 
End Function 

您可以通过

Dim cell As Range 
For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp)) 
    cell.Value = LatinString(cell.Value) 
Next 

在代码中使用这对于那些Unicode字符串转换为UTF8字符串,而无需使用正则表达式字节级的方法,检查出this article

+0

为什么不忽略大小写并使用更简单的表达式? – Tom

+0

你可以这么做@Tom,我尽量保持与OP的模式[简化版]相似的例子,以及链接问题中给出的例子。如果忽略我包含的'IgnoreCase = False'是默认的 - 我只是展示了一些选项! :) – Wolfie

0

你可以替换掉任何字符。 G。 ASCII范围(前128个字符)占位符使用下面的代码:

Option Explicit 

Sub Test() 

    Dim oCell As Range 

    With CreateObject("VBScript.RegExp") 
     .Global = True 
     .Pattern = "[^u0000-u00F7]" 
     For Each oCell In [A1:C4] 
      oCell.Value = .Replace(oCell.Value, "*") 
     Next 
    End With 

End Sub 
相关问题