How to initialize char array using hex numbers?

Question

I use utf8 and have to save a constant in a char array:

const char s[] = {0xE2,0x82,0xAC, 0}; //the euro sign

However it gives me error:

test.cpp:15:40: error: narrowing conversion of ‘226’ from ‘int’ to ‘const char’ inside { } [-fpermissive]

I have to cast all the hex numbers to char, which I feel tedious and don't smell good. Is there any other proper way of doing this?

@AaronMcDaid Look at my first sentence?

SwiftMango
– SwiftMango

2013-10-31 19:55:35 +00:00
Commented Oct 31, 2013 at 19:55 — SwiftMango
– SwiftMango, Commented Oct 31, 2013 at 19:55
Why not const char s[] = u8"\u20AC";?

Kerrek SB
– Kerrek SB

2013-10-31 19:56:11 +00:00
Commented Oct 31, 2013 at 19:56 — Kerrek SB
– Kerrek SB, Commented Oct 31, 2013 at 19:56
As @KerrekSB mentioned, but it's a c++11 feature.

πάντα ῥεῖ
– πάντα ῥεῖ

2013-10-31 19:57:32 +00:00
Commented Oct 31, 2013 at 19:57 — πάντα ῥεῖ
– πάντα ῥεῖ, Commented Oct 31, 2013 at 19:57

Basile Starynkevitch · Accepted Answer · 2021-04-16 04:48:12Z

38

char may be signed or unsigned (and the default is implementation specific). You probably want

  const unsigned char s[] = {0xE2,0x82,0xAC, 0};

or

  const char s[] = "\xe2\x82\xac";

or with many recent compilers (including GCC)

  const char s[] = "€";

(a string literal is an array of char unless you give it some prefix)

See -funsigned-char (or -fsigned-char) option of GCC.

On some implementations a char is unsigned and CHAR_MAX is 255 (and CHAR_MIN is 0). On others char-s are signed so CHAR_MIN is -128 and CHAR_MAX is 127 (and e.g. things are different on Linux/PowerPC/32 bits and Linux/x86/32 bits). AFAIK nothing in the standard prohibits 19 bits signed chars.

edited Apr 16, 2021 at 4:48

answered Oct 31, 2013 at 19:53

Basile Starynkevitch

231k18 gold badges323 silver badges578 bronze badges

Sign up to request clarification or add additional context in comments.

23 Comments

Zac Howland Over a year ago

@John If you do not specify the signedness of char, you are using the compiler's default ... which can (and likely will) change between different compiler vendors (or even different versions of the same compiler). When you need a char to be a byte, you should declare it as such (and not make assumptions about what the compiler may or may not do.

John Dibling Over a year ago

@BasileStarynkevitch: Yes, just a few days ago I spent a good while in the depths of the Standard to figure out why my code wasn't working, and I came across this gem from which I realized I needed three overloads, not two. Reference from C++03: 3.9.1 Fundamental types "1/ [...] Plain char, signed char, and unsigned char are three distinct types. [...]"

John Dibling Over a year ago

@ZacHowland: The same clause goes on to say that, "In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined." So char isn't the same as signed char or unsigned char, but they are so close on a fundamental level that in 15 years of programming C++ professionally I only needed to distiguish between them once.

James Kanze Over a year ago

Just my personal opinion, but from a stylistic point of view, if it is text, use char. I've tried to use unsigned char in the past (because I often have to deal with accented characters): it just doesn't work (because so many functions expect char* or std::string, and string literals are char[]), and it confuses the reader.

John Dibling Over a year ago

@ZacHowland: I predict in two years you'll have to write a third overload for something. But then yo'll be good for another 15 years. :)

|

Zac Howland · Accepted Answer · 2013-10-31 20:06:59Z

0

The short answer to your question is that you are overflowing a char. A char has the range of [-128, 127]. 0xE2 = 226 > 127. What you need to use is an unsigned char, which has a range of [0, 255].

unsigned char s = {0xE2,0x82,0xAC, 0};

answered Oct 31, 2013 at 20:06

Zac Howland

15.9k1 gold badge28 silver badges43 bronze badges

3 Comments

SwiftMango Over a year ago

So by default if there is no specifier, a char is signed?

Basile Starynkevitch Over a year ago

No, on some implementations a char is unsigned and CHAR_MAX is 255 (and CHAR_MIN is 0). On others char are signed so CHAR_MIN is -128 and CHAR_MAX is 127 (and e.g. things are different on Linux/PowerPC/32 bits and Linux/x86/32 bits).

Zac Howland Over a year ago

@texasbruce It is up to the compiler. On many compilers, the default is signed. If you need an unsigned, you should always specify it explicitly.

Mike Layton · Accepted Answer · 2016-11-03 23:08:43Z

0

While it may well be tedious to be putting lots of casts in your code, it actually smells extremely GOOD to me to use as strong of typing as possible.

As noted above, when you specify type "char" you are inviting a compiler to choose whatever the compiler writer preferred (signed or unsigned). I'm no expert on UTF-8, but there is no reason to make your code non-portable if you don't need to.

As far as your constants, I've used compilers that default constants written that way to signed ints, as well as compilers that consider the context and interpret them accordingly. Note that converting between signed and unsigned can overflow EITHER WAY. For the same number of bits, a negative overflows an unsigned (obviously) and an unsigned with the top bit set overflows a signed, because the top bit means negative.

In this case, your compiler is taking your constants as unsigned 8 bit--OR LARGER--which means they don't fit as signed 8 bit. And we are all grateful that the compiler complains (at least I am).

My perspective is, there is nothing at all bad about casting to show exactly what you intend to happen. And if a compiler lets you assign between signed and unsigned, it should require that you cast regardless of variables or constants. eg

const int8_t a = (int8_t) 0xFF; // will be -1

although in my example, it would be better to assign -1. When you are having to add extra casts, they either make sense, or you should code your constants so they make sense for the type you are assigning to.

answered Nov 3, 2016 at 23:08

Mike Layton

1011 silver badge6 bronze badges

1 Comment

ack Over a year ago

While the stronger type checking is probably good for catching bugs, it causes a lot of hurt for projects which have to deal with legacy code. Initializing char arrays from hex constants spanning 0x00-0xFF is quite common, cases in point: the X Bitmap (XBM) file format (which is actually a snippet of C source code with precisely such an itialization), along with many X library functions dealing with gradients, color maps, etc. which expect arrays of chars, not arrays of unsigned chars.

KungPhoo · Accepted Answer · 2021-12-06 19:13:51Z

0

Is there a way to mix these? I want a define macro FX_RGB(R,G,B) that makes a const string "\x01\xRR\xGG\xBB" so I can do the following: const char* LED_text = "Hello " FX_RGB(0xff, 0xff, 0x80) "World"; and get a sting: const char* LED_text = "Hello \x01\xff\xff\x80World";

answered Dec 6, 2021 at 19:13

KungPhoo

9489 silver badges23 bronze badges

Collectives™ on Stack Overflow

How to initialize char array using hex numbers?

4 Answers 4

23 Comments

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

23 Comments

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related