5

I am porting a large codebase from Linux/g++ to MacOS/Clang. I hit this compiler error in multiple places in Clang (where g++ builds successfully and does the right thing at run time):

error: initializer-string for char array is too long, array size is 1 but initializer has size X (including the null terminating character)

Note that I'm compiling for c++14.

I've reduced it to a manageable, reproducible case (eliminating all the uninteresting constructors and other methods), where the errors happen in the WTF constructor's assignments:

#include <stddef.h>

struct StringLike
{
    const char  *str;
    size_t      len;
    
    StringLike() : str(NULL), len(0) {}
    template <size_t LEN_> StringLike(const char (&litAry)[LEN_]) noexcept :
        str(litAry), len(LEN_ - 1) {}
    
    StringLike &operator=(const StringLike &rhs)
        {str = rhs.str; len = rhs.len; return *this;}
    template <size_t LEN_> StringLike &operator=(const char (&strLit)[LEN_])
        {str = strLit; len = LEN_ - 1; return *this;}
    const char *data() const {return str;}
    size_t length() const {return len;}
};

struct WTF
{
    StringLike litStrs[3];
    WTF()
    {
        litStrs[0] = {"Is "};
        litStrs[1] = {"this "};
        litStrs[2] = {"legal?"};
    }
};

Yes, I know I could remove the braces from the litStrs[𝒏], and it does work, but I'd like to not change too many lines of code unnecessarily.

I can't figure how Clang is hallucinating a char array of size 1?!? I do see that if I comment out the StringLike templated constructor, I get a similar error from g++; in the working case, g++ code is converting e.g. {"Is "} to a StringLike temporary via that constructor, then passing that to the operator=(const StringLike &rhs) method (note that removing the braces from the WTF constructor's assignments causes the templated operator= method to be invoked directly, instead).

I'm not really sure where to look in the standard to figure out how the WTF brace-enclosed assignments (operator=, not construction) are supposed to be handled, so I'm not sure whether Clang or g++ is right (Clang has a better track record IMO, but I'm stumped what the correct behavior is here).

I used godbolt to verify that all versions of g++ accept this code, and all versions of Clang complain and give up.

2 Answers 2

8

Clang is correct, although the behavior is quite surprising.

[expr.assign] p8 explains:

A braced-init-list B may appear on the right-hand side of

  • an assignment to a scalar of type T, in which case B shall have at most a single element. The meaning of x = B is x = t, where t is an invented temporary variable declared and initialized as T t = B.
  • an assignment to an object of class type, in which case B is passed as the argument to the assignment operator function selected by overload resolution ([over.assign], [over.match]).

Since StringLike is a class type, the second bullet applies. [over.match.oper] p2 explains that litStrs[0] = {"Is "} would be translated into (litStrs[0]).operator=({"Is "});, which Clang also rejects with the same error message.

The problem lies with how function template argument deduction works in this scenario; LEN_ would have to be deduced from the given {"Is "}. [temp.deduct.call] p1 explains:

In the P′[N] case, if N is a constant template parameter, N is deduced from the length of the initializer list.

In the case of {"Is "} and all your other attempts, the length of the initializer list is 1, so Clang is not hallucinating. You're really trying to initialize a const char[1] with a string literal of some other length at this point.

Possibly a defect

This behavior is quite surprising and possibly defective because usually, you can use extra braces when initializing an array with a string literal.

[dcl.init.string] explains that an array may be initialized by a string-literal, or by

an appropriately-typed string-literal enclosed in braces ([lex.string])

In other words, initialization of the parameter with "Is " and {"Is "} would be valid if it wasn't for template argument deduction breaking this.

Sign up to request clarification or add additional context in comments.

12 Comments

I'm not following your argument about const char[1]: I see how the length of the initializer list is 1, but I would expect it to be something like (const unsigned char *)[1] (a single char ptr) or const unsigned char [1][4] since the initializer item is an array of chars, not a single char?
Your constructor takes const char (&litAry)[LEN_], where only LEN_ is deduced, but the element type is fixed. Why would you expect the type of the array to magically change to const char*(&)[LEN_] then? Maybe you could get this to work with a const T(&litAry)[LEN_], but that surely introduces a bunch of other problems.
Sorry, didn't explain it well. I don't "expect the type to magically change"; rather, I expect the input type (an initializer list) to require some type conversion in order to match the signature of the operator=(const char (&strLit)[LEN_]) method. So then I would expect the compiler to either give an error that it couldn't find a conversion path for the initializer list (length 1, with the list element of either type const char * or type const char (&)[someLength]), or else that it couldn't find a matching operator= method...
...Why should the compiler first say "I have an initializer list of length 1, so that's the LEN_", then say "now I am going to dereference the first item of the initializer list to get a string type so I can complain about wrong sizes"? Why doesn't it instead complain that it can't convert an initializer list to a const char array? [I freely admit here I avoid initializer lists in my code except in very special cases, since they make e.g. constructor deduction more opaque, so I don't really understand the rules very well.]
The error output you're seeing seems like a pretty natural consequence of the behavior in this code, which is that "Is " is being used to initialize a single-character array. I suppose it would also be possible that the compiler tells you that initializing a single char using "Is " fails (which also takes place), but I'm not sure how helpful it would be if all such possible ways to fail were printed for a single overload. That could get spammy really quickly when there are multiple overloads.
"Is" is only "being used to initialize a single-character array" due to language lawyering; it's certainly not behavior most people would expect! As such, it doesn't seem like a pretty natural consequence to me. Again, the mystery that I don't understand is why the initializer list's first element gets implicitly "dereferenced" to a character type in order to construct the error; I would expect the compiler to require an explicit conversion to do this (and then an error to indicate when none such was found). Do you have any reference that explains this implicit conversion behavior?
I'm not sure what you mean by "dereferenced". There are also no surprising implicit conversions taking place. The type of the parameter is an array of characters, and the element within the initializer list is an array. Nothing is being dereferenced or converted per-se. The problem is that no one thought of the special case where the element in the initializer list is a string literal. The current template deduction rules make sense if we assume that {} is always being used to list the elements going into the array, like {'a', 'b'}
Let me try again: {"Is"} creates a std::initializer_list<T>, where T is something like char[3] or char * (I guess the latter or else a list like {"Is", "two"} would be impossible). OK, so there's a pointer (and count) to something array like (T *), stored in std::initializer_list<T>, allowing a function to operate on this array object. But my operator=() doesn't take a std::initializer_list<> argument, rather it takes a const char (&)[LEN_]. Where in the spec does the compiler get the authority to pretend that std::initializer_list<> and const char (&)[LEN_] are compatible without
conversion? In other words, by "dereferenced", I mean the compiler (after using the std::initializer_list<> length to assign the template LEN_ parameter), takes just the first element of the (size=1) initializer_list, sees that it is some form of char array, then complains about assigning to a too-small char array. What permission does the compiler have in the spec to take the type init_list[0] as the type rather than std::initializer_list<> itself? Looks like a dereference to me (and what would it do if I had given it {"Is", "two"})?
{"Is"} does not create std::intializer_list. You're misunderstanding something here; in general, those braces are just initialization syntax for various types. There are some "fallback" cases in the language where braces are being interpreted as std::initializer_list, but that doesn't ever happen in function arguments (unless the parameter is std::initializer_list)
I guess the fundamental problem I have with this issue is that I cannot answer the question "What is the type of {"Is"} for the purpose of overload resolution/template-argument deduction?" Based on my reading of e.g. eel.is/c++draft/dcl.init.list and the way it sometimes seems to conflate braced-init-list and std::initializer_list<>, I had assumed the latter to be the underlying type. So what is the type of {"Is"}? of {"Is", "two"}? You already indicated it's some type of list/array. Without knowing its type how can I know how overloads will resolve/template parameters be deduced?
|
2

You might want the constructor to look like

WTF() : litStrs{"Is ", "this ", "legal?"} {}

Note, that the constructor template

  template <size_t LEN_>
  StringLike(const char (&litAry)[LEN_]) noexcept

is not an implicit converting constructor that you might want to be used (templates do not get special properties). You should call it explicitly like

litStrs[0] = StringLike{"Is "};

Consider using std::string_view instead of reinventing it with StringLike.

1 Comment

Three things to note about your answer: 1) The sample code was a stripped-down case, the actual code is more complex (e.g. X-macros are used to define the number and contents of all literal strings, making constructor refactoring not as straightforward). 2) Because I'm porting a large code base, I'd prefer to make as many changes only in the StringLike class rather than in the multiple places that use it. 3) As mentioned, I'm compiling for C++14; std::string_view wasn't added until C++17 (and in fact, the port to Microsoft was a hassle because C++11 support was only added a few years ago).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.