7

How do I remove all special characters which don't fall under ASCII category in VBA?

These are some of the symbols which appear in my string.

Œ œ Š š Ÿ ƒ

There are many more such characters.

These don't belong to ASCII category as you can see here http://www.ascii.cl/htmlcodes.htm

I tried something like this

strName = Replace(strName, ChrW(376), " ")
2
  • Terminology: As you know, Unicode is a superset of ASCII. However, the ASCII characters are in what is called a "block" not a Unicode category; specifically the C0 Controls and Basic Latin block. Commented May 4, 2016 at 16:19
  • FYI The ChrW CharCode argument is a Long that identifies a character, but doesn't allow values greater than 65535 (hex value &HFFFF). It might be of some practical interest to find systematic ways to overcome this restriction - c.f. six solutions at Get UniCode characters with CharCode greater hex FFFF, e.g. in Unicode block 1F300-1F5FF "Miscellaneous symbols and pictographs" :-) Commented Nov 22, 2019 at 15:17

6 Answers 6

8

Would a RegEx solution be of interest to you?

There are plenty of examples for different languages on this site - here's a C# one: How can you strip non-ASCII characters from a string? (in C#).

Try this for VBA:

Private Function GetStrippedText(txt As String) As String
    Dim regEx As Object

    Set regEx = CreateObject("vbscript.regexp")
    regEx.Pattern = "[^\u0000-\u007F]"
    GetStrippedText = regEx.Replace(txt, "")

End Function
Sign up to request clarification or add additional context in comments.

7 Comments

Hey Thanks a lot, it works. Not sure of all conditions and all special Unicode characters, but for now it works well. It just takes too much time for processing.
If you're converting many strings eg in a loop, then try defining the pattern once and just calling the .Replace function inside your loop. That might help with speed.
Thank you :) That did not reduce the time much but anyway it helped to reduce a little bit of time.
Does this cover all the unicode or just 0 to 7F? Will this work for all boundary conditions? Please let me know.
It will replace any character not in the Unicode range 0 to 127.
|
4

Don't need to loop each character

Maybe late, but maybe it helps someone:

Public Function StripNonAsciiChars(ByVal InputString As String) As String
    Dim i As Integer
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "[^\u0000-\u007F]"
        StripNonAsciiChars = Application.WorksheetFunction.Trim(RegEx.Replace(InputString, " "))
    End With
End Function

2 Comments

wow... Thanks. it work for me without Application.WorksheetFunction.Trim. Worksheet function give me error.
@Mabaega Ofc, you may play with the Replace function too, e.g. you may put dashes, or leave the spaces, this is up to your needs. Lately, starting with the ClickToRun aproatch of MS Office gives 1004 error for the base excel functions in VBA...
3

Try with below

Function ClearUnwantedString(fulltext As String) As String
    Dim output As String
    Dim character As String
    For i = 1 To Len(fulltext)
        character = Mid(fulltext, i, 1)
        If (character >= "a" And character <= "z") Or (character >= "0" And character <= "9") Or (character >= "A" And character <= "Z") Then
            output = output & character
        End If
    Next
    ClearUnwantedString = output
End Function

Sub test()
    a = ClearUnwantedString("dfjŒœŠdskl")
End Sub

4 Comments

I am using something similar, but the moto of my code is not this. It should allow special character which are ASCII, but only remove special character which doesn't come under ASCII code of 0 to 255.
My String looks like LPAIF in excel sheet, but there is a Ÿ char hidden in front which is not visible in excel, but it exist, so the string is ŸLPAIF. Some solutions given above worked when I gave "ŸLPAIF" as a static string, but not when read from excel directly. Anyway we have a solution now, thanks for trying to help.
I tried hard removing strange characters from my string in VBA and this code was very helpfull for me
It removes all unicode characters in a string. However, it also removes spaces.
1

What do you get when you write the following in the immediate window?

?Replace("ŸŸŸŸ", ChrW(376), "ale")

I get: alealealeale

2 Comments

Yes It prints the same for me. I am having Ÿ as special character, but when passed as a string inside Replace method strName = Replace(strName, ChrW(376), " "), where strName is initially ŸLPAIF becomes ?LPAIF. This string then goes to a write to file method, where the code crashes saying "Run-time error '5' Invalid procedure call or argument". I think that ? newly produced is also not an ordinary question mark, but some special character. I want to replace it in my dynamic string, not in a static one as "ŸŸŸŸ" you shown.
Hi, I think it is working in Debug.Print but when using actual value from cell where Ÿ is present, it doesn't do that.
0

try application.clean()

it will remove all unprintable characters

6 Comments

Can you please give me an example for it. I am not able to find any application.clean() in the autotext code in vba.
Please take a look at this pasteboard.co/FtkoMrB.png I also have tried "strName = Clean(strName)" and they did not work.
Try with Application.WorksheetFunction.Clean("üäöaŠsd"). But it will not help you, it is only for non-printable characters. And yours are printable.
@Vityata You are right. This is just a test to see if you are paying attention xD :P But @Jeevan .clean exist under worksheetfunction but it can also be called under application. I'll just leave the answer here for reference.
I did not get exactly what you mean. The "application.clean" does not show with Intellisense, but with "application.worksheetfunction.clean" it is there.
|
0

Supposing you have:

enter image description here

Then the following code will get the String from A1 and let through only ANSI (code 0 to 255) in A2.

Sub test()
 Dim s1 As String, s2 As String, c As String, i As Long, iAsc As Integer

 s1 = Range("A1").Value

 s2 = ""

 For i = 1 To Len(s1)
  c = Mid(s1, i, 1)
  iAsc = AscW(c)
  If iAsc <= 255 Then
   s2 = s2 & c
  End If
 Next

 Range("A2").Value = s2

End Sub

4 Comments

Hi, I tried this code, and it replaced Ÿ with a ? in the output. Which still doesn't solve the issue.
I think while reading itself Ÿ is read as ? by excel. Ÿ is anyway invisible in excel sheet.
As you see in my picture, the "This Œ is œ a Š testš . Ÿ ƒ Blubb." is actual content of my sheet. With this my code works exactly as I have described it. Where do you see the "Ÿ" which is then not visible in the sheet?
I see Ÿ when I copy paste that string from the excel cell to a notepad. Thanks for trying to help. We have a solution now Ambie. Thanks for your efforts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.