Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
obviously, the problem with getting list comprehensions right starts with nesting them, that's why that's not trivial
Source Link
Wolf
  • 10.3k
  • 8
  • 72
  • 116

A list of lists named xss can be flattened using a nested list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = []

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists named xss can be flattened using a list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = []

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists named xss can be flattened using a nested list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = []

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

As revision #10 says, using (l, I, 1) in the same answer is confusing. Revisit Haskell-like notation. Easier to grok when generic names indicate their kind / types (x == T) (xs = list[T]) (xss = list[list[T]]). More evidence: the comment "[leaf for tree in forest for leaf in tree]".
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155

A list of lists named lxss can be flattened using a list comprehension:

flat_list = [
    itemx
    for sublistxs in lxss
    for itemx in sublistxs
]

The above is equivalent to:

flat_list = [] 

for sublistxs in lxss:
    for itemx in sublistxs:
        flat_list.append(itemx)

Here is the corresponding function:

def flatten(lxss):
    return [item[x for sublistxs in lxss for itemx in sublist]xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1s'xss=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item'[x for sublistxs in lxss for itemx in sublist]'xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1s'xss=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(lxss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1s'xss=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda xxs,y ys: x+yxs + ys,l xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of IM items each: the first IM items are copied back and forth L-1L-1 times, the second IM items L-2L-2 times, and so on; total number of copies is IM times the sum of x for x from 1 to L excluded, i.e., IM * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists named xss can be flattened using a list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = [] 

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Add documentation link: [`timeit`].
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeittimeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Add documentation link. Correct syntax highlighting (for bash). Format for readability by adding newlines. Reorder the "fastest method" claim to right before the timing is done.
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155
Loading
added 12 characters in body
Source Link
user3064538
user3064538
Loading
Rollback to Revision 9
Source Link
user3064538
user3064538
Loading
Rollback to Revision 12
Source Link
user3064538
user3064538
Loading
Reword.
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155
Loading
Fix ordering. Added documentation link.
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155
Loading
Format and reword for clarity. Use Haskell-like notation "x" to signify single item, "xs" to signify list, and "xss" to signify list of lists. https://stackoverflow.com/q/6267735/365102 Also help introduce a new generation of programmers to this "type" of type-thinking. :D
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155
Loading
Named lambdas are bad practice. Use a def instead.
Source Link
wjandrea
  • 33.8k
  • 10
  • 69
  • 105
Loading
added 14 characters in body
Source Link
Richard
  • 62.7k
  • 40
  • 198
  • 277
Loading
PEP8 states [quote] Never use the characters 'l' (lowercase letter el), 'O' (uppercase letter oh), or 'I' (uppercase letter eye) as single character variable names.[/quote] I replaced: "l" (ell) -> "t", "L"->"T", "I" (eye) ->"k" (https://www.python.org/dev/peps/pep-0008/#names-to-avoid)
Source Link
Z4-tier
  • 8k
  • 3
  • 31
  • 47
Loading
add initialization of vaiable (otherwise we get NameError: name 'flat_list' is not defined)
Source Link
Loading
added 117 characters in body
Source Link
Mona Jalal
  • 38.9k
  • 75
  • 264
  • 434
Loading
Added the flatten function
Source Link
Guillaume Jacquenot
  • 11.8k
  • 6
  • 45
  • 50
Loading
added 5 characters in body
Source Link
Mateen Ulhaq
  • 27.8k
  • 21
  • 121
  • 155
Loading
clarification of some code (since I was not able to figure it out immediately)
Source Link
Loading
Added evidence and explanation as requested.
Source Link
Alex Martelli
  • 887.2k
  • 175
  • 1.3k
  • 1.4k
Loading
Source Link
Alex Martelli
  • 887.2k
  • 175
  • 1.3k
  • 1.4k
Loading