5

I have this code and I still can't seem to replace non English characters like Vietnamese or Thai from my data with a simple "placeholder".

Sub NonLatin()
Dim cell As Range
    For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
        s = cell.Value
            For i = 1 To Len(s)
                If Mid(s, i, 1) Like "[!A-Za-z0-9@#$%^&* * ]" Then cell.Value = "placeholder"
            Next
    Next
End Sub

Appreciate your help

5
  • Also wouldn't you need an i and a cell after your NEXT statements? Commented Aug 7, 2017 at 10:31
  • Have a look at using RegEx instead Commented Aug 7, 2017 at 10:40
  • 1
    @Luuklag you don't have to include the counter variable after the Next statement, it's just good practice as it increases readability. See this question Commented Aug 7, 2017 at 10:42
  • @Wilson are you trying to replace the non-English characters with a placeholder, or change the value of the entire cell if it contains a non-English character? You may find this article useful, which contains code to convert strings to UTF-8 characters and in-fill unknown characters with ? Commented Aug 7, 2017 at 10:52
  • @Wolfie Good to know, still not too old to learn something ;) Commented Aug 7, 2017 at 11:30

2 Answers 2

1

You can replace any chars that are out of e. g. ASCII range (first 128 chars) with placeholder using the below code:

Option Explicit

Sub Test()

    Dim oCell As Range

    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "[^u0000-u00F7]"
        For Each oCell In [A1:C4]
            oCell.Value = .Replace(oCell.Value, "*")
        Next
    End With

End Sub
Sign up to request clarification or add additional context in comments.

Comments

0

See this question for details about using Regular Expressions in your VBA code.


Then use regular expressions in a function like this one to process strings. Here I am assuming you want to replace each invalid character with a placeholder, rather than the entire string. If it's the entire string then you don't need to do individual character checks, you can simply use the + or * qualifiers for multiple characters in your Regular Expression's pattern, and test the entire string together.

Function LatinString(str As String) As String
    ' After including a reference to "Microsoft VBScript Regular Expressions 5.5"
    ' Set up the regular expressions object
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        ' This is the pattern of ALLOWED characters. 
        ' Note that special characters should be escaped using a slash e.g. \$ not $
        .Pattern = "[A-Za-z0-9]"
    End With

    ' Loop through characters in string. Replace disallowed characters with "?"
    Dim i As Long
    For i = 1 To Len(str)
        If Not regEx.Test(Mid(str, i, 1)) Then
            str = Left(str, i - 1) & "?" & Mid(str, i + 1)
        End If
    Next i
    ' Return output
    LatinString = str
End Function

You can use this in your code by

Dim cell As Range
For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
    cell.Value = LatinString(cell.Value)
Next

For a byte-level method which converts a Unicode string to a UTF8 string, without using Regular Expressions, check out this article

2 Comments

Why not ignore case and use a simpler expression?
You could well do that @Tom, I was keeping the example as similar as possible to [a simplified version of] the OP's pattern, and the example given in the linked question. It would be even neater to leave out the line I included as IgnoreCase = False is the default - I was just showing some options! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.