0

The sum() function work only with numbers, not strings.

  int_lst = [1, 2]
  sum(int_lst)
=> 3

   str_lst = ['a', 'b']
   sum(str_lst)
Traceback (most recent call last):
  File "python", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

I found this behaviour strange because sum() function is just a more pythonic way to do reduce(lambda i, sum:i+sum). And reduce allows me to concatenate strings, but sum() doesn't.

From python documentation for sum()

The iterable‘s items are normally numbers, and the start value is not allowed to be a string.

So why?

OOP teaches us to make polymorphic stuff because it's flexible.

0

1 Answer 1

6

Summing strings is very inefficient; summing strings in a loop requires that a new string is created for each two strings being concatenated, only to be destroyed again when the next string is concatenated with that result.

For example, for summing ['foo', 'bar', 'baz', 'spam', 'ham', 'eggs'] you'd create 'foobar', then 'foobarbaz', then 'foobarbazspam', then 'foobarbazspamham', then finally 'foobarbazspamhameggs', discarding all but the last string object.

You'd use the str.join() method instead:

''.join(str_list)

which creates one new string and copies in the contents of the constituent strings.

Note that sum() uses a default starting value of 0, which is why you get your specific exception message:

>>> 0 + ''
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

You can give sum() a different starting value as the second argument; for strings that'll give you a more meaningful error message:

>>> sum(['foo', 'bar'], '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

The function is otherwise not limited to just numbers; you can use it for any other type that defines __add__ operations, but you have to specify a sensible start value. You could 'sum' lists, for example:

>>> sum([['foo', 'bar'], ['ham', 'spam']], [])
['foo', 'bar', 'ham', 'spam']

but note the [] value for the second (start) argument! This is also just as inefficient as summing strings; the efficient method would be using list(itertools.chain.from_iterable(list_of_lists)).

Sign up to request clarification or add additional context in comments.

6 Comments

list(chain.from_iterable(l)) would be faster again
@PadraicCunningham: did you time that? Because either way the list is going to be iterated over.
I have always found it to be faster. On l = [['foo', 'bar'], ['ham', 'spam']] * 100000 it is 12ms vs 8.3ms.
@PadraicCunningham: yeah, just confirmed from_iterable is marginally faster.
@PadraicCunningham: I used testdata = [[None] * random.randint(1, 10) for _ in range(100)] and 10k repeats to get 0.15 vs 0.22.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.