Extract lists within lists containing a string in python

Question

I'm trying to divide a nested list into two nested lists using list comprehensions. I am unable to do so without converting the inner lists to strings, which in turn ruins my ability to access/print/control the values later on.

I tried this::

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn't you like to know?'], ...]

derived = [k for k in paragraphs3 if 'Derived:' in k]
therest = [k for k in paragraphs3 if 'Derived:' not in k]

What happens is that the whole of paragraphs3 = [] ends up in therest = [], unless i do something like this:

for i in paragraphs3:
    i = str(i)
    paragraphs4.append(i)

If I then feed paragraphs4 to the list comprehension, I get two lists, just like I want. But they are not nested lists anymore since this:

    for i in therest:
        g.write('\n'.join(i))
        g.write('\n\n')

Writes each !character! in therest = [] in a separate line:

'
P
a
g
e
:

2
'

Thus I'm looking for a better way to split paragraphs3 ... Or maybe the solution lies elsewhere? The end result/output I'm looking for is:

Page: 2
Bib: Something
Derived: This n that

Page: 3
Bib: Something
.
.
.

can you please describe the desired output better? My impression is your input is already what you want as output — Pynchia
– Pynchia, Commented Jan 11, 2016 at 12:47
@Pynchia: it is - i'm just trying to seperate two groups of items, because i write them to file separately later on. — treakec
– treakec, Commented Jan 11, 2016 at 12:53
@Lav: Fixed - that is, paragraphs3 is always a list of lists, which never contain any sublists. — treakec
– treakec, Commented Jan 11, 2016 at 12:54

PM 2Ring · Accepted Answer · 2016-01-11 13:10:13Z

2

This code separates the sublists based on whether they contain a string that starts with 'Derived:'.

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"], ]

def show(paragraphs):
    for para in paragraphs:
        print('\n'.join(para), '\n')

derived = []
therest = []

print('---input---')
show(paragraphs3)

for para in paragraphs3:
    if any(item.startswith('Derived:') for item in para):
        derived.append(para)
    else:
        therest.append(para)

print('---derived---')
show(derived)

print('---therest---')
show(therest)

output

---input---
Page: 2
Bib: Something
Derived:  This n that 

Page: 3
Bib: Something
Argument: Wouldn't you like to know? 

---derived---
Page: 2
Bib: Something
Derived:  This n that 

---therest---
Page: 3
Bib: Something
Argument: Wouldn't you like to know?

The most important part of this code is

`any(item.startswith('Derived:') for item in para)`

This iterates over the individual strings in para (the current paragraph), and returns True as soon as it finds a string that starts with 'Derived:'.

FWIW, that for loop can be condensed down to:

for para in paragraphs3:
    (therest, derived)[any(item.startswith('Derived:') for item in para)].append(para)

because False and True evaluate to 0 and 1 respectively, so they can be used to index the (therest, derived) tuple. However, many people would consider that verging on unreadable. :)

edited Jan 11, 2016 at 13:10

answered Jan 11, 2016 at 12:58

PM 2Ring

55.6k6 gold badges96 silver badges202 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

treakec Over a year ago

I checked your answer first and it worked! Thank you. I will try the others a bit later I'm sure many are correct, but I am most comfortable with the good old for loop, though I hear it's the slowest option?

PM 2Ring Over a year ago

@treakec: Thanks! The good old for loop with append is slightly slower than the equivalent list comprehension, but not much. However, for this application using append works out much faster than doing two list comprehensions, since the list comp versions have to scan and test everything twice: once for the derived list and once again for the therest list.

PM 2Ring Over a year ago

@treakec: And as I mentioned in my answer, using any on a generator expression returns True as soon as it finds a match, it only has to scan the whole list if it doesn't find a match.

JRodDynamite · Accepted Answer · 2016-01-11 12:57:09Z

The code you've written is almost correct. You need to check if 'Derived:' is present in the 3rd element of the list. k basically contains the first element of paragraphs3

>>> paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?']]
>>> paragraphs3[0]
['Page: 2', 'Bib: Something', 'Derived:  This n that']
>>> paragraphs3[0][2] # Here is where you want to check the condition
'Derived:  This n that'

So all you have to do is change the condition to if 'Derived:' in k[2].

>>> [k for k in paragraphs3 if 'Derived:' in k[2]]
[['Page: 2', 'Bib: Something', 'Derived:  This n that']]

>>> [k for k in paragraphs3 if 'Derived:' not in k[2]]
[['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"]]

Lav · Accepted Answer · 2016-01-11 13:08:34Z

Solution

derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]
therest = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' not in k, l))]

Detailed explanation

Copying the entire list:

[l for l in paragraph3]

Copying list with condition:

[l for l in paragraph3 if sublist_contains('Derived: ', l)]

Function sublist_contains is not implemented yet, so let's implement it.

Retrieve only items which match the condition_check:

filter(condition_check, l)

Since condition_check can be expressed as a lambda function:

filter(lambda k: 'Derived: ' in k, l)

Converting result to boolean (will be True if at least one item is found matching the condition):

any(filter(lambda k: 'Derived: ' in k, l))

And replacing sublist_contains with resulting inline code:

derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]

Daenyth · Accepted Answer · 2016-01-11 13:21:16Z

It seems like your inner list has structure; the list itself is one value, not just a list of unrelated values. With that in mind, you could write a class to represent that data.

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?'], ...]

class Paragraph(object):
    def __init__(self, page, bib, extra):
        self.page = page
        self.bib = bib
        self.extra = extra

    @property
    def is_derived(self):
        return 'Derived: ' in self.extra

paras = [Paragraph(p) for p in paragraphs3]

You can then use the partition recipe from itertools to split that one list into two iterators.

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

(not_derived_paras, derived_paras) = partition(lambda p: p.is_derived, paras)

Rick · Accepted Answer · 2016-01-11 15:15:25Z

0

This seems to me like the most straight-forward way of doing it:

[p for p in paragraphs3 if 'Derived:' in '\n'.join(p)]
[p for p in paragraphs3 if 'Derived:' not in '\n'.join(p)]

However, if you'd like, you can get a lot fancier, and pull this off in a single line (though it will be more complicated than necessary).

result = {k:[p for p in paragraphs3 if ('Derived:' in '\n'.join(p)) == test]  for k,test in {'derived': True, 'therest': False}.items()}

This produces a dict with 'derived' and 'therest' as keys. Now you can do this:

for k,p in result.items():
    print(k)
    for i in p:
        print(''.join(i))

edited Jan 11, 2016 at 15:15

answered Jan 11, 2016 at 13:11

Rick

45.6k17 gold badges82 silver badges123 bronze badges

Collectives™ on Stack Overflow

Extract lists within lists containing a string in python

5 Answers 5

output

3 Comments

Comments

Solution

Detailed explanation

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

output

3 Comments

Comments

Solution

Detailed explanation

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related