Unicode, Bytes, and String to Integer Conversion

Question

I am writing a program that is dealing with letters from a foreign alphabet. The program is taking the input of a number that is associated with the unicode number for a character. For example 062A is the number assigned in unicode for that character.

I first ask the user to input a number that corresponds to a specific letter, i.e. 062A. I am now attempting to turn that number into a 16-bit integer that can be decoded by python to print the character back to the user.

example:

for \u0394

print(bytes([0x94, 0x03]).decode('utf-16'))

however when I am using

int('062A', '16')

I receive this error:

ValueError: invalid literal for int() with base 10: '062A'

I know it is because I am using A in the string, however that is the unicode for the symbol. Can anyone help me?

Base parameter shouldn't a string but an integer : int('062A', 16) — Marsu
– Marsu, Commented Jun 28, 2020 at 21:58
I don't understand; what is the intended relationship between 0x062A and \u0394? — Karl Knechtel
– Karl Knechtel, Commented Jun 28, 2020 at 21:58
I tested int('062A', '16'), and got the same error as @KarlKnechtel (TypeError: 'str' object cannot be interpreted as an integer). Please ensure that your post contains the entire, correct, error output. — AMC
– AMC, Commented Jun 28, 2020 at 22:07
The problem that I had was mostly with the incorrect use of the parameter. I'm still learning how to do basic things, and so I made a nooby mistake. The example of \u0394 and the relationship to 0x062A was none at all. This is my first stack overflow post, sorry for mistakes. I'll do better next time, and thank you all. — D Tee
– D Tee, Commented Jun 28, 2020 at 22:54

Karl Knechtel · Accepted Answer · 2020-06-28 22:02:46Z

1

however when I am using int('062A', '16'), I receive this error: ValueError: invalid literal for int() with base 10: '062A'

No, you aren't:

>>> int('062A', '16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an integer

It's exactly as it says. The problem is not the '062A', but the '16'. The base should be specified directly as an integer, not a string:

>>> int('062A', 16)
1578

If you want to get the corresponding numbered Unicode code point, then converting through bytes and UTF-16 is too much work. Just directly ask using chr, for example:

>>> chr(int('0394', 16))
'Δ'

answered Jun 28, 2020 at 22:02

Karl Knechtel

61.4k14 gold badges132 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Karl Knechtel Over a year ago

You may or may not be able to display Arabic characters on your terminal. Python can't do anything about that. The simplest, cross-platform way to make sure your strings contain the right text, is to write them out to a text file and view them in a Unicode-aware text editor.

D Tee Over a year ago

I'm still a newby when it comes to doing this. You figured it out. I thought that converting my stuff into bytes and and UTF-16 is a lot of work too, and it dissuaded me from wanting to work on the project because it felt too hard. Fortunately after reading your comment and answer I've been able to make some more progress. Thank you.

Karl Knechtel Over a year ago

For what it's worth: to convert from the integer to bytes, use the .to_bytes method of the integer - you need to tell it how many bytes to use, and the endianness. There will, I am sure, eventually be a project where you do need this :)

Collectives™ on Stack Overflow

Unicode, Bytes, and String to Integer Conversion

for \u0394

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

for \u0394

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related