1

I am interested to know how the string class implements copying from a character array for initialization of its contents.

My guess would be something like:

1: Find length of character array, N. (how is this done? a crude method would be to look at each character individually until the null character is found? is a better method used?)

2: Allocate N bytes of storage.

3: Use strcpy to copy each element byte by byte.

Obviously this is not a very complicated question, I was just interested to know whether the following are (essentially or approximately) equivalent:

std::string program_name(argv[0]);

and

std::string program_name;
int length = 0;
while(*(argv[0] + length) != '/0')
    ++ length;
++ length; // Depends on whether string contains the null character - usually I don't think it does?
program_name.resize(length); // Maybe use reserve instead?
std::cpy(program_name.data(), argv[0], length - 1); // Don't copy the null character at the end

Something like that anyway. I have't attempted to compile the above pseudocode because I am interested in the concept of the method not the fine detail of how this operation is done.

4
  • 7
    The implementation of std::string is available in the headers that come with your C++ compiler. Use the source, Luke. Spoiler alert: it's pretty much the way you envision, there's really nothing clever going on. Commented Feb 1, 2015 at 14:46
  • Your steps are semantically equivalent but won't be as efficient as the optimized standard library implementation. Your manual loop will be much slower than strlen and the resizing needlessly initializes the buffer just to immediately overwrite it again. Commented Feb 1, 2015 at 14:49
  • Expanding on Igor's comment, there cannot be anything clever going on. A c-style string is defined as a contiguous character array terminated by a NUL character. It doesn't keep a dedicated length field. To answer your question about whether or not std::basic_string stores a zero-terminator: Since c_str() is required to return in O(1), storing a zero-terminator is pretty much mandatory. Commented Feb 1, 2015 at 14:50
  • Your solution conceptually similar, but wrong: "Modifying the character array accessed through data is undefined behavior. " Commented Feb 1, 2015 at 15:06

1 Answer 1

3

In short, your implementation is pretty much how it works.

Ignoring the fact that std::string is implemented from std::basic_string which is templated to cope with various data types stored in the string (notably "wide characters"), std::string constructor from char * could be written something like this:

std::string(const char* init_value)
{
    size_t m_len = strlen(init_value);
    char *m_storage = new char[m_len+1];
    std::copy(m_storage, init_value, m_len+1);
}

Of course, the actual implementation will be more indirect [probably has a specific function to "grow/allocate", for example], due to the inheritance and templated nature of the real implementation.

Here's a REAL implementation out of libcxx:

template <class _CharT, class _Traits, class _Allocator>
inline _LIBCPP_INLINE_VISIBILITY
basic_string<_CharT, _Traits, _Allocator>::basic_string(const value_type* __s)
{
    _LIBCPP_ASSERT(__s != nullptr, "basic_string(const char*) detected nullptr");
    __init(__s, traits_type::length(__s));
#if _LIBCPP_DEBUG_LEVEL >= 2
    __get_db()->__insert_c(this);
#endif
}

where __init does this:

template <class _CharT, class _Traits, class _Allocator>
void
basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz)
{
    if (__sz > max_size())
        this->__throw_length_error();
    pointer __p;
    if (__sz < __min_cap)
    {
        __set_short_size(__sz);
        __p = __get_short_pointer();
    }
    else
    {
        size_type __cap = __recommend(__sz);
        __p = __alloc_traits::allocate(__alloc(), __cap+1);
        __set_long_pointer(__p);
        __set_long_cap(__cap+1);
        __set_long_size(__sz);
    }
    traits_type::copy(_VSTD::__to_raw_pointer(__p), __s, __sz);
    traits_type::assign(__p[__sz], value_type());
}

It does some tricks to store the value inside the pointer [and allocate with the relevant allocator, which may not be new], and explicitly initializes the end marker [traits_type::assign(__p[__sz], value_type());, as the call to __init may happen with a different argument than a C style string, so end marker is not guaranteed.

traits_type::length() is strlen

template <>
struct _LIBCPP_TYPE_VIS_ONLY char_traits<char>
{
...
    static inline size_t length(const char_type* __s) {return strlen(__s);}
....
};

Of course, other STL implementations may well use a different detail implementation, but roughly it is as my simplified example, but a bit more obfuscated to cope with many types and reusing code.

Sign up to request clarification or add additional context in comments.

5 Comments

std::string is not derived from std::basic_string; it's an alias for std::basic_string<char>.
@T.C.: Reworded to reflect.
The "obfuscation" is somewhat required: nothing non-public can use a name that you could legally #define, for example, and the class is required to use the allocator and the traits. The code you present also seems to be using the short string optimization.
@JamesKanze: Sure, it's obfuscated for a decent reason, but it's still more complicated than what you'd write if you just did it as a hobby project at home with 1 intended user... ;)
@MatsPetersson Or even what I'd do most of the time professionally:-).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.