514

Is the a short syntax for joining a list of lists into a single list( or iterator) in python?

For example I have a list as follows and I want to iterate over a,b and c.

x = [["a","b"], ["c"]]

The best I can come up with is as follows.

result = []
[ result.extend(el) for el in x] 

for el in result:
  print el
1

15 Answers 15

737
import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))

This gives

['a', 'b', 'c']
Sign up to request clarification or add additional context in comments.

9 Comments

no need to list() it! for item in itertools.chain(*a): do somethign with item
A bit of explanation would also be nice. docs.python.org/library/itertools.html#itertools.chain
result = []; map(result.extend, a) is ~30% faster than itertools.chain. But chain.from_iterable is a tiny bit faster than map+extend. [Python 2.7, x86_64]
This explains what's happening with the *a: stackoverflow.com/questions/5239856/foggy-on-asterisk-in-python (it sends the elements of a as arguments to chain, like removing the outer [ and ]).
chain.from_iterable is significantly faster if you have many iterables to concatenate. For me it was ~50% faster when creating ctypes arrays of OpenGL vertices from 100s of python lists containing 10s or 100s of vertices each. The '*' operator converts your iterable into an intermediate tuple that it passes in to chain.
|
385
x = [["a","b"], ["c"]]

result = sum(x, [])

9 Comments

@Aaron, explain for a noob python learner please: Is O(n^2) good or bad in this case? ;-)
O(n^2) here basically means that the time required for this function to execute is proportional to the square of the length of the inputs. So if you double the inputs, you quadruple the time required. This is a Bad Thing if you have large inputs, but for small ones it should be fine. But a faster method will be better.
Can somebody explain how and why sum(x, []) works?
What an awful thing to read! The more I use python the more it's half finished state hits home. I can't imagine why the standard library doesn't have a flat() function built in.
@Requin sum(iterable, start) adds all elements in iterable to start. Adding lists concats them. start defaults to 0, which would break if interable consists of lists as you can't add list to an int, so we pass an empty list as start.
|
281

If you're only going one level deep, a nested comprehension will also work:

>>> x = [["a","b"], ["c"]]
>>> [inner
...     for outer in x
...         for inner in outer]
['a', 'b', 'c']

On one line, that becomes:

>>> [j for i in x for j in i]
['a', 'b', 'c']

5 Comments

Very cool, so for the next depth-level it will become [i for ll in x for l in ll for i in l] - at this point it starts getting a bit lame for the reader, but nevertheless cool :)
For three levels, it gets nasty: >>> x = [[["a", "b"], ["c"]], [["d"]]] >>> [k for i in x for j in i for k in j] ['a', 'b', 'c', 'd']
Listception.. this is definitely unpythonic / against the zen of python in that it is not the simplest or most explicit way to do it. You end up hard coding recursion. Still cool though.
@ZachEstela, I'm happy to see someone call this unpythonic. It seems like many techniques others like to call pythonic are not easily understood at first glance. Readability is one of the things that makes Python attractive to me. This solution is cool, and probably the fastest, but the sum(x, []) solution is much more Pythonic.
Those "more pythonic" answers are just wrong. The question wasn't about recursive joining, but joining a list of lists, which means there are no more depth levels to join.
57
flat_list = []
map(flat_list.extend, list_of_lists)

shortest!

6 Comments

sum(listoflists,[]) # shorter!
@recursive Shorter but different functionally = much worse performance-wise, see comments on other variants for explanation
This tiny snippet appears to be the fastest way around for non-recursive flatten. Needs more upvotes.
in Python 3.1+, wrap map withlist(), or else you'll see <map object at 0x0000...> when you print the result
This solution does not work for me.
|
33

If you need a list, not a generator, use list():

from itertools import chain
x = [["a","b"], ["c"]]
y = list(chain(*x))

3 Comments

s/x/*x/ (or chain.from_iterable(x) preferably)
I do not understand what it does. join is supposed to have a separator.
@Val chain makes a generator that will output 'a', 'b', 'c'. list converts it into a list.
33

This is known as flattening, and there are a LOT of implementations out there.

How about this, although it will only work for 1 level deep nesting:

>>> x = [["a","b"], ["c"]]
>>> for el in sum(x, []):
...     print el
...
a
b
c

From those links, apparently the most complete-fast-elegant-etc implementation is the following:

def flatten(l, ltypes=(list, tuple)):
    ltype = type(l)
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        i += 1
    return ltype(l)

4 Comments

Ah, 'sum(L,I)' is shorthand for 'reduce(plus_operator, L, I)'. That's kinda cool.
your "most complete-elegant-etc" is not "elegant" at all!! see the docs for itertools.chain to see true elegance!
@hasen j: I believe he means best for arbitrary nested lists. chain assumes a consistent, one-deep list of lists (which is probably all the question needs), but flatten handles things like [a,b,[c], [d,[e,f]],[[[g]]]].
Unfortunately this breaks if you're using pylab, because numpy's sum gets imported into the global namespace, and that function doesn't work that way.
30

A performance comparison:

import itertools
import timeit
big_list = [[0]*1000 for i in range(1000)]
timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]

Producing:

>>> import itertools
>>> import timeit
>>> big_list = [[0]*1000 for i in range(1000)]
>>> timeit.repeat(lambda: list(itertools.chain.from_iterable(big_list)), number=100)
[3.016212113769325, 3.0148865239060227, 3.0126415732791028]
>>> timeit.repeat(lambda: list(itertools.chain(*big_list)), number=100)
[3.019953987082083, 3.528754223385439, 3.02181439266457]
>>> timeit.repeat(lambda: (lambda b: map(b.extend, big_list))([]), number=100)
[1.812084445152557, 1.7702404451095965, 1.7722977998725362]
>>> timeit.repeat(lambda: [el for list_ in big_list for el in list_], number=100)
[5.409658160700605, 5.477502077679354, 5.444318360412744]
>>> [100*x for x in timeit.repeat(lambda: sum(big_list, []), number=1)]
[399.27587954973444, 400.9240571138051, 403.7521153804846]

This is with Python 2.7.1 on Windows XP 32-bit, but @temoto in the comments above got from_iterable to be faster than map+extend, so it's quite platform and input dependent.

Stay away from sum(big_list, [])

2 Comments

Super helpful. Thanks! Note that in Python3, we need a list() around the map() version, otherwise the results are too good to be true.
There's a few downvotes. I can't figure out what they're referring to. If you see a mistake, could you point it out? If there is a mistake, it should be easy to fix, which would be nice for the future generations of visitors.
16

This works recursively for infinitely nested elements:

def iterFlatten(root):
    if isinstance(root, (list, tuple)):
        for element in root:
            for e in iterFlatten(element):
                yield e
    else:
        yield root

Result:

>>> b = [["a", ("b", "c")], "d"]
>>> list(iterFlatten(b))
['a', 'b', 'c', 'd']

3 Comments

>>> a = [] >>> a.append(a) >>> b = iterFlatten(a) >>> next(b) RuntimeError: maximum recursion depth exceeded in __instancecheck__ :)
@Darthfett would you expect a meaningful result for flattening an "infinitely-nested list"? :-)
@Kos A version that checks for such cases (by using a stack/set to check for self-references in a list) could be preferable than simply continuing to flatten until reaching the recursion depth limit. This could bypass the issue by simply giving the value, instead of trying to flatten it.
8

Late to the party but ...

I'm new to python and come from a lisp background. This is what I came up with (check out the var names for lulz):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

Seems to work. Test:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

returns:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]

4 Comments

You come from a lisp background? I never would have guessed from the code... haha
Nice, been doing Python for some time now and I haven't seen var-arg tuple unpacking like you did with car, *cdr. (e-> probably because it's Python 3 and I'm still digging 2 for some reason :-))
What's the point of the if lst:?
its 4 a.m. and I was hoping to find something that goes deep like 3-4 list nested, thanks you wrote this since I am not able to code now :D.
5

What you're describing is known as flattening a list, and with this new knowledge you'll be able to find many solutions to this on Google (there is no built-in flatten method). Here is one of them, from http://www.daniel-lemire.com/blog/archives/2006/05/10/flattening-lists-in-python/:

def flatten(x):
    flat = True
    ans = []
    for i in x:
        if ( i.__class__ is list):
            ans = flatten(i)
        else:
            ans.append(i)
    return ans

2 Comments

This method works nicely for a mix of lists of strings and strings (e.g. [['some', 'string'], 'and', 'another'] ), while the itertools techniques do not. This works well for my needs.
this method does not work when you have something like this [[[a, b, c], [d, e, f]], [[g, h, l]]]
4

For one-level flatten, if you care about speed, this is faster than any of the previous answers under all conditions I tried. (That is, if you need the result as a list. If you only need to iterate through it on the fly then the chain example is probably better.) It works by pre-allocating a list of the final size and copying the parts in by slice (which is a lower-level block copy than any of the iterator methods):

def join(a):
    """Joins a sequence of sequences into a single sequence.  (One-level flattening.)
    E.g., join([(1,2,3), [4, 5], [6, (7, 8, 9), 10]]) = [1,2,3,4,5,6,(7,8,9),10]
    This is very efficient, especially when the subsequences are long.
    """
    n = sum([len(b) for b in a])
    l = [None]*n
    i = 0
    for b in a:
        j = i+len(b)
        l[i:j] = b
        i = j
    return l

Sorted times list with comments:

[(0.5391559600830078, 'flatten4b'), # join() above. 
(0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a) 
(0.5419249534606934, 'flatten4a'), # Similar, using zip() 
(0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a)) 
(0.7472689151763916, 'flatten1'), # list(itertools.chain(*a)) 
(1.5468521118164062, 'flatten3'), # [i for j in a for i in j] 
(26.696547985076904, 'flatten2')] # sum(a, [])

5 Comments

Can you add timings to confirm that this is faster than the other methods presented?
Sorted times list with comments: [(0.5391559600830078, 'flatten4b'), # join() above. (0.5400412082672119, 'flatten4c'), # Same, with sum(len(b) for b in a) (0.5419249534606934, 'flatten4a'), # Similar, using zip() (0.7351131439208984, 'flatten1b'), # list(itertools.chain.from_iterable(a)) (0.7472689151763916, 'flatten1'), # list(itertools.chain(*a)) (1.5468521118164062, 'flatten3'), # [i for j in a for i in j] (26.696547985076904, 'flatten2')] # sum(a, [])
You skipped map(result.extend, a)
There's a benchmark ideone.com/9q3mrp
@Kos, you are right! I'm lame. I probably omitted it originally because it "obviously" has bad O() time due to multiple copies, but now that I add it to my test, in practice it looks like it is successfully using realloc() to avoid this, and so it is winning hands down under all conditions. I remain skeptical, though, that it might revert to horrible behavior in a real working environment with fragmented memory. In a simple test app like this, with a clean slate of memory, it is free to keep extending the array without moving it. Thoughts?
3

There's always reduce (being deprecated to functools):

>>> x = [ [ 'a', 'b'], ['c'] ]
>>> for el in reduce(lambda a,b: a+b, x, []):
...  print el
...
__main__:1: DeprecationWarning: reduce() not supported in 3.x; use functools.reduce()
a
b
c
>>> import functools
>>> for el in functools.reduce(lambda a,b: a+b, x, []):
...   print el
...
a
b
c
>>>

Unfortunately the plus operator for list concatenation can't be used as a function -- or fortunate, if you prefer lambdas to be ugly for improved visibility.

4 Comments

GAH, I cannot believe they are deprecating it to functools. Anyway, you don't need the extra empty list, this will work just fine: reduce(lambda a,b: a+b, x)
Versions of the operators are defined as functions in the operator module, which are faster and less ugly than the lambda: "functools.reduce(operator.add, [[1,2,3],[4,5]],[])". Alternatively, just use sum()
Personally, I think the lambda way is quite pretty. :-)
If you want to do a reduce, then reduce over extend not add to avoid spamming the memory with temporary lists. Wrap extend with a function that extends then returns the list itself.
2

Or a recursive operation:

def flatten(input):
    ret = []
    if not isinstance(input, (list, tuple)):
        return [input]
    for i in input:
        if isinstance(i, (list, tuple)):
            ret.extend(flatten(i))
        else:
            ret.append(i)
    return ret

Comments

1

Sadly, Python doesn't have a simple way to flatten lists. Try this:

def flatten(some_list):
    for element in some_list:
        if type(element) in (tuple, list):
            for item in flatten(element):
                yield item
        else:
            yield element

Which will recursively flatten a list; you can then do

result = []
[ result.extend(el) for el in x] 

for el in flatten(result):
      print el

Comments

1

I had a similar problem when I had to create a dictionary that contained the elements of an array and their count. The answer is relevant because, I flatten a list of lists, get the elements I need and then do a group and count. I used Python's map function to produce a tuple of element and it's count and groupby over the array. Note that the groupby takes the array element itself as the keyfunc. As a relatively new Python coder, I find it to me more easier to comprehend, while being Pythonic as well.

Before I discuss the code, here is a sample of data I had to flatten first:

{ "_id" : ObjectId("4fe3a90783157d765d000011"), "status" : [ "opencalais" ],
  "content_length" : 688, "open_calais_extract" : { "entities" : [
  {"type" :"Person","name" : "Iman Samdura","rel_score" : 0.223 }, 
  {"type" : "Company",  "name" : "Associated Press",    "rel_score" : 0.321 },          
  {"type" : "Country",  "name" : "Indonesia",   "rel_score" : 0.321 }, ... ]},
  "title" : "Indonesia Police Arrest Bali Bomb Planner", "time" : "06:42  ET",         
  "filename" : "021121bn.01", "month" : "November", "utctime" : 1037836800,
  "date" : "November 21, 2002", "news_type" : "bn", "day" : "21" }

It is a query result from Mongo. The code below flattens a collection of such lists.

def flatten_list(items):
  return sorted([entity['name'] for entity in [entities for sublist in  
   [item['open_calais_extract']['entities'] for item in items] 
   for entities in sublist])

First, I would extract all the "entities" collection, and then for each entities collection, iterate over the dictionary and extract the name attribute.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.