Efficient string concatenation in C++

Question

I heard a few people expressing worries about "+" operator in std::string and various workarounds to speed up concatenation. Are any of these really necessary? If so, what is the best way to concatenate strings in C++?

Basically the + is NOT a concatentation operator (as it generates a new string). Use += for concatenation. — Loki Astari
– Loki Astari, Commented Mar 4, 2009 at 17:40
Since C++11, there's an important point: operator+ can modify one of its operands & return it by-move if that operand was passed by rvalue reference. libstdc++ does this, for example. So, when calling operator+ with temporaries, it can achieve almost-as-good performance - perhaps an argument in favour of defaulting to it, for the sake of readability, unless one has benchmarks showing it is a bottleneck. However, a Standardised variadic append() would be both optimal and readable... — underscore_d
– underscore_d, Commented Jun 25, 2017 at 17:45
This thread sums up perfectly everything that's wrong with SO right now. Highly upvoted answers, long-since outdated by changes to the C++ standard, with commenters like @underscore_d doing their best to combat the misinformation but not actually able to fix the answers most users will see. — PBS
– PBS, Commented Sep 4, 2024 at 7:31

Daniel Griscom · Accepted Answer · 2022-04-13 09:02:12Z

104

The extra work is probably not worth it, unless you really really need efficiency. You probably will have much better efficiency simply by using operator += instead.

Now after that disclaimer, I will answer your actual question...

The efficiency of the STL string class depends on the implementation of STL you are using.

You could guarantee efficiency and have greater control yourself by doing concatenation manually via c built-in functions.

Why operator+ is not efficient:

Take a look at this interface:

template <class charT, class traits, class Alloc>
basic_string<charT, traits, Alloc>
operator+(const basic_string<charT, traits, Alloc>& s1,
          const basic_string<charT, traits, Alloc>& s2)

You can see that a new object is returned after each +. That means that a new buffer is used each time. If you are doing a ton of extra + operations it is not efficient.

Why you can make it more efficient:

You are guaranteeing efficiency instead of trusting a delegate to do it efficiently for you
the std::string class knows nothing about the max size of your string, nor how often you will be concatenating to it. You may have this knowledge and can do things based on having this information. This will lead to less re-allocations.
You will be controlling the buffers manually so you can be sure that you won't copy the whole string into new buffers when you don't want that to happen.
You can use the stack for your buffers instead of the heap which is much more efficient.
string + operator will create a new string object and return it hence using a new buffer.

Considerations for implementation:

Keep track of the string length.
Keep a pointer to the end of the string and the start, or just the start and use the start + the length as an offset to find the end of the string.
Make sure the buffer you are storing your string in, is big enough so you don't need to re-allocate data
Use strcpy instead of strcat so you don't need to iterate over the length of the string to find the end of the string.

Rope data structure:

If you need really fast concatenations consider using a rope data structure.

edited Apr 13, 2022 at 9:02

Daniel Griscom

2,3334 gold badges32 silver badges57 bronze badges

answered Mar 4, 2009 at 16:14

Brian R. Bondy

349k129 gold badges607 silver badges641 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

James Curran Over a year ago

Note: "STL" refers to a completely separate open-source library, originally by HP, some part of which were used as a basis for parts of the ISO Standard C++ Library. "std::string", however, was never part of HP's STL, so it's completely wrong to reference "STL and "string" together.

Brian R. Bondy Over a year ago

I wouldn't say it's wrong to use STL and string together. See sgi.com/tech/stl/table_of_contents.html

James Curran Over a year ago

When SGI took over maintenance of the STL from HP, it was retro-fitted to match the Standard Library (which is why I said "never part of HP's STL"). Nevertheless, the originator of std::string is the ISO C++ Committee.

James Curran Over a year ago

Side note: The SGI employee who was in charge of maintaining the STL for many years was Matt Austern, who, at the same time, headed the Library subgroup of the ISO C++ Standardization Committee.

h7r Over a year ago

Can you please clarify or give some points to why You can use the stack for your buffers instead of the heap which is much more efficient.? Where does this efficiency difference comes from?

|

Carlos A. Ibarra · Accepted Answer · 2009-03-04 16:29:49Z

94

Reserve your final space before, then use the append method with a buffer. For example, say you expect your final string length to be 1 million characters:

std::string s;
s.reserve(1000000);

while (whatever)
{
  s.append(buf,len);
}

answered Mar 4, 2009 at 16:29

Carlos A. Ibarra

6,1721 gold badge32 silver badges38 bronze badges

3 Comments

CPlus Over a year ago

A million? When are strings ever a million characters?

NoWar Over a year ago

@CPlus Hey... Somtimes we need to calc things like DNA. The fern was found to have a record-breaking genome size of 160 billion base pairs of DNA, which when unravelled would stretch out to about 100 metres. By comparison, the human genome contains about three billion base pairs and would stretch to about two metres.

CPlus Over a year ago

@NoWar I might be wrong but base pairs are essentially bits, having 2 possible states, AT or CG. Meaning you should be able to store the fern genome in 20 GB, right?

Johannes Schaub - litb · Accepted Answer · 2009-03-04 17:16:19Z

22

I would not worry about it. If you do it in a loop, strings will always preallocate memory to minimize reallocations - just use operator+= in that case. And if you do it manually, something like this or longer

a + " : " + c

Then it's creating temporaries - even if the compiler could eliminate some return value copies. That is because in a successively called operator+ it does not know whether the reference parameter references a named object or a temporary returned from a sub operator+ invocation. I would rather not worry about it before not having profiled first. But let's take an example for showing that. We first introduce parentheses to make the binding clear. I put the arguments directly after the function declaration that's used for clarity. Below that, i show what the resulting expression then is:

((a + " : ") + c) 
calls string operator+(string const&, char const*)(a, " : ")
  => (tmp1 + c)

Now, in that addition, tmp1 is what was returned by the first call to operator+ with the shown arguments. We assume the compiler is really clever and optimizes out the return value copy. So we end up with one new string that contains the concatenation of a and " : ". Now, this happens:

(tmp1 + c)
calls string operator+(string const&, string const&)(tmp1, c)
  => tmp2 == <end result>

Compare that to the following:

std::string f = "hello";
(f + c)
calls string operator+(string const&, string const&)(f, c)
  => tmp1 == <end result>

It's using the same function for a temporary and for a named string! So the compiler has to copy the argument into a new string and append to that and return it from the body of operator+. It cannot take the memory of a temporary and append to that. The bigger the expression is, the more copies of strings have to be done.

Next Visual Studio and GCC will support c++1x's move semantics (complementing copy semantics) and rvalue references as an experimental addition. That allows figuring out whether the parameter references a temporary or not. This will make such additions amazingly fast, as all the above will end up in one "add-pipeline" without copies.

If it turns out to be a bottleneck, you can still do

 std::string(a).append(" : ").append(c) ...

The append calls append the argument to *this and then return a reference to themselves. So no copying of temporaries is done there. Or alternatively, the operator+= can be used, but you would need ugly parentheses to fix precedence.

edited Mar 4, 2009 at 17:16

answered Mar 4, 2009 at 16:22

Johannes Schaub - litb

510k132 gold badges926 silver badges1.2k bronze badges

2 Comments

underscore_d Over a year ago

I had to check stdlib implementors really do this. :P libstdc++ for operator+(string const& lhs, string&& rhs) does return std::move(rhs.insert(0, lhs)). Then if both are temporaries, its operator+(string&& lhs, string&& rhs) if lhs has sufficient capacity available will just directly append(). Where I think this risks being slower than operator+= is if lhs does not have enough capacity, as then it falls back to rhs.insert(0, lhs), which not only must extend the buffer & add the new contents like append(), but also needs to shift along the original contents of rhs right.

underscore_d Over a year ago

The other piece of overhead compared to operator+= is that operator+ still must return a value, so it has to move() whichever operand it appended to. Still, I guess that's a fairly minor overhead (copying a couple of pointers/sizes) compared to deep-copying the entire string, so it's good!

JasonMArcher · Accepted Answer · 2015-05-22 16:56:31Z

16

std::string operator+ allocates a new string and copies the two operand strings every time. repeat many times and it gets expensive, O(n).

std::string append and operator+= on the other hand, bump the capacity by 50% every time the string needs to grow. Which reduces the number of memory allocations and copy operations significantly, O(log n).

edited May 22, 2015 at 16:56

JasonMArcher

15.1k22 gold badges59 silver badges53 bronze badges

answered May 22, 2015 at 16:31

timmerov

1791 silver badge6 bronze badges

2 Comments

underscore_d Over a year ago

I'm not quite sure why this was downvoted. The 50% figure is not required by the Standard, but IIRC that or 100% are common measures of growth in practice. Everything else in this answer seems unobjectionable.

underscore_d Over a year ago

Months later, I suppose it's not all that accurate, since it was written long after C++11 debuted, and overloads of operator+ where one or both arguments is passed by rvalue reference can avoid allocating a new string altogether by concatenating into the existing buffer of one of the operands (albeit they might have to realloc if it has insufficient capacity).

Tim · Accepted Answer · 2009-03-04 16:16:28Z

7

perhaps std::stringstream instead?

But I agree with the sentiment that you should probably just keep it maintainable and understandable and then profile to see if you are really having problems.

answered Mar 4, 2009 at 16:16

Tim

20.4k26 gold badges124 silver badges219 bronze badges

2 Comments

ArtemGr Over a year ago

stringstream is slow, see groups.google.com/d/topic/comp.lang.c++.moderated/aiFIGb6za0w

mloskot Over a year ago

@ArtemGr stringstream may be fast, see codeproject.com/Articles/647856/…

Pesto · Accepted Answer · 2009-03-04 16:15:18Z

6

For most applications, it just won't matter. Just write your code, blissfully unaware of how exactly the + operator works, and only take matters into your own hands if it becomes an apparent bottleneck.

answered Mar 4, 2009 at 16:15

Pesto

24k2 gold badges74 silver badges76 bronze badges

6 Comments

Brian R. Bondy Over a year ago

Of course it's not worth it for most cases, but this doesn't really answer his question.

Johannes Schaub - litb Over a year ago

yeah. i agree just saying "profile then optimize" can be put as comment on the question :)

Brian R. Bondy Over a year ago

Fair enough, but it is definitely needed for some applications. So in those applications the answer reduces to: 'take matters into your own hands'

Brian R. Bondy Over a year ago

Sorry to be so critical. I just thought an explanation of why operator+ was not efficient would be needed for him to determine if in his case he needed to do it.

MrFox Over a year ago

@Pesto There's a perverted notion in the programming world that performance doesn't matter and we can just ignore the whole deal because computers keep getting faster. The thing is, that's not why people program in C++ and that's not why they post questions on stack overflow about efficient string concatenation.

|

James Curran · Accepted Answer · 2009-03-04 16:23:33Z

6

Unlike .NET System.Strings, C++'s std::strings are mutable, and therefore can be built through simple concatenation just as fast as through other methods.

answered Mar 4, 2009 at 16:23

James Curran

104k37 gold badges186 silver badges264 bronze badges

5 Comments

Mark Ransom Over a year ago

Especially if you use reserve() to make the buffer big enough for the result before you start.

Johannes Schaub - litb Over a year ago

i think he is talking about operator+= . it's also concatenating, although it's a degenerate case. james was a vc++ mvp so i expect he has some clue of c++ :p

Brian R. Bondy Over a year ago

I don't doubt for a second that he has extensive knowledge on C++, just that there was a misunderstanding about the question. The question asked about the efficiency of operator+ which returns new string objects each time it is called, and hence uses new char buffers.

Johannes Schaub - litb Over a year ago

yeah. but then he asked for the case operator+ is slow, what the best way is to do a concatenation. and here operator+= comes into game. but i agree james' answer is a little short. it makes it sound like we all could use operator+ and it's top efficient :p

underscore_d Over a year ago

@BrianR.Bondy operator+ does not have to return a new string. Implementors can return one of its operands, modified, if that operand was passed by rvalue reference. libstdc++ does this, for example. So, when calling operator+ with temporaries, it can achieve the same or almost as good performance - which might be another argument in favour of defaulting to it unless one has benchmarks showing that it represents a bottleneck.

Luc Hermitte · Accepted Answer · 2009-03-04 17:04:00Z

5

In Imperfect C++, Matthew Wilson presents a dynamic string concatenator that pre-computes the length of the final string in order to have only one allocation before concatenating all parts. We can also implement a static concatenator by playing with expression templates.

That kind of idea have been implemented in STLport std::string implementation -- that does not conform to the standard because of this precise hack.

answered Mar 4, 2009 at 17:04

Luc Hermitte

33.2k7 gold badges73 silver badges91 bronze badges

1 Comment

underscore_d Over a year ago

Glib::ustring::compose() from the glibmm bindings to GLib does that: estimates and reserve()s the final length based upon the provided format string and the varargs, then append()s each (or its formatted replacement) in a loop. I expect this is a pretty common way of working.

Pete Kirkham · Accepted Answer · 2009-03-04 16:22:13Z

3

As with most things, it's easier not to do something than to do it.

If you want to output large strings to a GUI, it may be that whatever you're outputting to can handle the strings in pieces better than as a large string (for example, concatenating text in a text editor - usually they keep lines as separate structures).

If you want to output to a file, stream the data rather than creating a large string and outputting that.

I've never found a need to make concatenation faster necessary if I removed unnecessary concatenation from slow code.

answered Mar 4, 2009 at 16:22

Pete Kirkham

49.5k5 gold badges96 silver badges176 bronze badges

1 Comment

Dr Phil Nov 5 at 0:51

Pretty underrated answer. When you create a large string, what are you going to do with it? Maybe do that directly.

Mykola Golubyev · Accepted Answer · 2009-03-04 16:20:05Z

2

For small strings it doesn't matter. If you have big strings you'd better to store them as they are in vector or in some other collection as parts. And addapt your algorithm to work with such set of data instead of the one big string.

I prefer std::ostringstream for complex concatenation.

answered Mar 4, 2009 at 16:20

Mykola Golubyev

60.4k15 gold badges94 silver badges102 bronze badges

1 Comment

Rishabh Bhatnagar Over a year ago

what is a complex concatenation?

LanDenLabs · Accepted Answer · 2019-01-27 16:59:13Z

2

Probably best performance if you pre-allocate (reserve) space in the resultant string.

template<typename... Args>
std::string concat(Args const&... args)
{
    size_t len = 0;
    for (auto s : {args...})  len += strlen(s);

    std::string result;
    result.reserve(len);    // <--- preallocate result
    for (auto s : {args...})  result += s;
    return result;
}

Usage:

std::string merged = concat("This ", "is ", "a ", "test!");

answered Jan 27, 2019 at 16:59

LanDenLabs

1,68618 silver badges11 bronze badges

1 Comment

MSalters Over a year ago

This works only with const char*. If you use std::string_view(s).size() it will work with many more types, including third-party types that support string_view. Even smarter, accept an initializer_list<string_view> and then result +=s will also use the string_view. This saves you from calculating strlen(s) twice (there's an implicit strlen in std::string += const char*)

voltento · Accepted Answer · 2020-01-30 15:45:19Z

1

You can try this one with memory reservations for each item:

namespace {
template<class C>
constexpr auto size(const C& c) -> decltype(c.size()) {
  return static_cast<std::size_t>(c.size());
}

constexpr std::size_t size(const char* string) {
  std::size_t size = 0;
  while (*(string + size) != '\0') {
    ++size;
  }
  return size;
}

template<class T, std::size_t N>
constexpr std::size_t size(const T (&)[N]) noexcept {
  return N;
}
}

template<typename... Args>
std::string concatStrings(Args&&... args) {
  auto s = (size(args) + ...);
  std::string result;
  result.reserve(s);
  return (result.append(std::forward<Args>(args)), ...);
}

answered Jan 30, 2020 at 15:45

voltento

90713 silver badges27 bronze badges

Comments

Pedro Vicente · Accepted Answer · 2017-06-02 18:13:32Z

0

A simple array of characters, encapsulated in a class that keeps track of array size and number of allocated bytes is the fastest.

The trick is to do just one large allocation at start.

at

https://github.com/pedro-vicente/table-string

Benchmarks

For Visual Studio 2015, x86 debug build, substancial improvement over C++ std::string.

| API                   | Seconds           
| ----------------------|----| 
| SDS                   | 19 |  
| std::string           | 11 |  
| std::string (reserve) | 9  |  
| table_str_t           | 1  |

edited Jun 2, 2017 at 18:13

answered Jun 1, 2017 at 19:59

Pedro Vicente

7472 gold badges11 silver badges24 bronze badges

1 Comment

underscore_d Over a year ago

The OP is interested in how to efficiently concatenate std::string. They are not asking for an alternative string class.

user1050755 · Accepted Answer · 2022-06-25 09:34:28Z

Benchmark as per Visual Studio C/C++ 17.2.5 and Boost 1.79.0 on Ryzen 5600x:

n iter = 10
n parts = 10000000
total string result length = 70000000
Boost join: 00:00:02.105006
std::string append (Reserve): 00:00:00.485498
std::string append (simple): 00:00:00.679999
Note: times are cumulative sums over all iterations.

Conclusion: boost implementation not very good regarding performance. Using std::string's reserve not too impactful unless the final string length is at least around multiple tens of megabytes.

The simple append (without reserve) might even be faster in practice because the benchmark used an already initialized vector of string parts. In practice, that vector is often only necessary for the reserve/boost join variant and therefore an additional performance penalty for them.

Another run:

n iter = 100
n parts = 1000000
total string result length = 6000000
Boost join: 00:00:01.953999
std::string append (Reserve): 00:00:00.535502
std::string append (simple): 00:00:00.679002
Note: times are cumulative sums over all iterations.

Collectives™ on Stack Overflow

Efficient string concatenation in C++

14 Answers 14

10 Comments

3 Comments

2 Comments

2 Comments

2 Comments

6 Comments

5 Comments

1 Comment

1 Comment

1 Comment

1 Comment

Comments

Benchmarks

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

10 Comments

3 Comments

2 Comments

2 Comments

2 Comments

6 Comments

5 Comments

1 Comment

1 Comment

1 Comment

1 Comment

Comments

Benchmarks

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related