Revisions to How do I make a flat list out of a list of lists?

obviously, the problem with getting list comprehensions right starts with nesting them, that's why that's not trivial

Source Link

edited Mar 5, 2024 at 11:24

Wolf

10.3k
8
72
116

A list of lists named xss can be flattened using a nested list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = []

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists named xss can be flattened using a list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = []

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists named xss can be flattened using a nested list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = []

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

As revision #10 says, using (l, I, 1) in the same answer is confusing. Revisit Haskell-like notation. Easier to grok when generic names indicate their kind / types (x == T) (xs = list[T]) (xss = list[list[T]]). More evidence: the comment "[leaf for tree in forest for leaf in tree]".

Source Link

edited Dec 12, 2023 at 2:45

Mateen Ulhaq

27.8k
21
121
155

A list of lists named lxss can be flattened using a list comprehension:

flat_list = [
    itemx
    for sublistxs in lxss
    for itemx in sublistxs
]

The above is equivalent to:

flat_list = [] 

for sublistxs in lxss:
    for itemx in sublistxs:
        flat_list.append(itemx)

Here is the corresponding function:

def flatten(lxss):
    return [item[x for sublistxs in lxss for itemx in sublist]xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1s'xss=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item'[x for sublistxs in lxss for itemx in sublist]'xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1s'xss=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(lxss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1s'xss=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda xxs,y ys: x+yxs + ys,l xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of IM items each: the first IM items are copied back and forth L-1L-1 times, the second IM items L-2L-2 times, and so on; total number of copies is IM times the sum of x for x from 1 to L excluded, i.e., IM * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists named xss can be flattened using a list comprehension:

flat_list = [
    x
    for xs in xss
    for x in xs
]

The above is equivalent to:

flat_list = [] 

for xs in xss:
    for x in xs:
        flat_list.append(x)

Here is the corresponding function:

def flatten(xss):
    return [x for xs in xss for x in xs]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' '[x for xs in xss for x in xs]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'sum(xss, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'xss=[[1,2,3],[4,5,6],[7],[8,9]]*99' 'reduce(lambda xs, ys: xs + ys, xss)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of M items each: the first M items are copied back and forth L-1 times, the second M items L-2 times, and so on; total number of copies is M times the sum of x for x from 1 to L excluded, i.e., M * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Add documentation link: [`timeit`].

Source Link

edited Dec 12, 2023 at 2:35

Mateen Ulhaq

27.8k
21
121
155

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeittimeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

A list of lists l can be flattened using a list comprehension:

flat_list = [
    item
    for sublist in l
    for item in sublist
]

The above is equivalent to:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

Here is the corresponding function:

def flatten(l):
    return [item for sublist in l for item in sublist]

This is the fastest method. As evidence, using the timeit module in the standard library, we see:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the methods based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Add documentation link. Correct syntax highlighting (for bash). Format for readability by adding newlines. Reorder the "fastest method" claim to right before the timing is done.

Source Link

edited Dec 12, 2023 at 2:24

Mateen Ulhaq

27.8k
21
121
155

Loading

added 12 characters in body

Source Link

edited Jul 27, 2022 at 6:55

user3064538

Loading

Rollback to Revision 9

Source Link

edited Jul 27, 2022 at 6:55

user3064538

Loading

Rollback to Revision 12

Source Link

edited Jul 27, 2022 at 6:54

user3064538

Loading

Reword.

Source Link

edited Jun 5, 2022 at 6:34

Mateen Ulhaq

27.8k
21
121
155

Loading

Fix ordering. Added documentation link.

Source Link

edited Jun 5, 2022 at 6:26

Mateen Ulhaq

27.8k
21
121
155

Loading

Format and reword for clarity. Use Haskell-like notation "x" to signify single item, "xs" to signify list, and "xss" to signify list of lists. https://stackoverflow.com/q/6267735/365102 Also help introduce a new generation of programmers to this "type" of type-thinking. :D

Source Link

edited Jun 5, 2022 at 6:20

Mateen Ulhaq

27.8k
21
121
155

Loading

Named lambdas are bad practice. Use a def instead.

Source Link

edited Jul 4, 2021 at 18:40

wjandrea

33.8k
10
69
105

Loading

added 14 characters in body

Source Link

edited Jan 6, 2021 at 23:17

Richard

62.7k
40
198
277

Loading

PEP8 states [quote] Never use the characters 'l' (lowercase letter el), 'O' (uppercase letter oh), or 'I' (uppercase letter eye) as single character variable names.[/quote] I replaced: "l" (ell) -> "t", "L"->"T", "I" (eye) ->"k" (https://www.python.org/dev/peps/pep-0008/#names-to-avoid)

Source Link