2

I need your help.. How to convert unicode characters like this in C++

Thére Àre sôme spëcial charâcters ïn thìs têxt
عربى

to HTML encoding like this ?

Thére Àre sôme spëcial charâcters ïn thìs têxt
عربى

Your help will be greatly appreciated Thank you :)

1
  • Thanks Kevin but this isn't what I want. Commented Aug 5, 2014 at 18:57

1 Answer 1

2

Unless you can find a third-party API to handle this for you, you will likely have to code it yourself manually:

  1. Convert the input string data to codepoint values (ie, to UTF-32).

  2. For each codepoint value:

    a. if it is in the ASCII visual range (U+0009, U+000A, U+000D, and U+0020 through U+007E), store/display the value as-is as an 8bit ASCII character.

    b. otherwise, check if there is an available entity name associated with the codepoint (see this, this, this and this) and if so then store/display that name in &name; format.

    c. otherwise, store/display the codepoint value in &#XXXX; format, where XXXX is the numeric value of the codepoint.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much Remy Lebeau :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.