4

I'm trying to divide a nested list into two nested lists using list comprehensions. I am unable to do so without converting the inner lists to strings, which in turn ruins my ability to access/print/control the values later on.

I tried this::

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn't you like to know?'], ...]

derived = [k for k in paragraphs3 if 'Derived:' in k]
therest = [k for k in paragraphs3 if 'Derived:' not in k]

What happens is that the whole of paragraphs3 = [] ends up in therest = [], unless i do something like this:

for i in paragraphs3:
    i = str(i)
    paragraphs4.append(i)

If I then feed paragraphs4 to the list comprehension, I get two lists, just like I want. But they are not nested lists anymore since this:

    for i in therest:
        g.write('\n'.join(i))
        g.write('\n\n') 

Writes each !character! in therest = [] in a separate line:

'
P
a
g
e
:

2
'

Thus I'm looking for a better way to split paragraphs3 ... Or maybe the solution lies elsewhere? The end result/output I'm looking for is:

Page: 2
Bib: Something
Derived: This n that

Page: 3
Bib: Something
.
.
.
4
  • can you please describe the desired output better? My impression is your input is already what you want as output Commented Jan 11, 2016 at 12:47
  • Is the nested list depth fixed or arbitrary? Commented Jan 11, 2016 at 12:47
  • @Pynchia: it is - i'm just trying to seperate two groups of items, because i write them to file separately later on. Commented Jan 11, 2016 at 12:53
  • @Lav: Fixed - that is, paragraphs3 is always a list of lists, which never contain any sublists. Commented Jan 11, 2016 at 12:54

5 Answers 5

2

This code separates the sublists based on whether they contain a string that starts with 'Derived:'.

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"], ]

def show(paragraphs):
    for para in paragraphs:
        print('\n'.join(para), '\n')

derived = []
therest = []

print('---input---')
show(paragraphs3)

for para in paragraphs3:
    if any(item.startswith('Derived:') for item in para):
        derived.append(para)
    else:
        therest.append(para)

print('---derived---')
show(derived)

print('---therest---')
show(therest)

output

---input---
Page: 2
Bib: Something
Derived:  This n that 

Page: 3
Bib: Something
Argument: Wouldn't you like to know? 

---derived---
Page: 2
Bib: Something
Derived:  This n that 

---therest---
Page: 3
Bib: Something
Argument: Wouldn't you like to know? 

The most important part of this code is

`any(item.startswith('Derived:') for item in para)`

This iterates over the individual strings in para (the current paragraph), and returns True as soon as it finds a string that starts with 'Derived:'.


FWIW, that for loop can be condensed down to:

for para in paragraphs3:
    (therest, derived)[any(item.startswith('Derived:') for item in para)].append(para)

because False and True evaluate to 0 and 1 respectively, so they can be used to index the (therest, derived) tuple. However, many people would consider that verging on unreadable. :)

Sign up to request clarification or add additional context in comments.

3 Comments

I checked your answer first and it worked! Thank you. I will try the others a bit later I'm sure many are correct, but I am most comfortable with the good old for loop, though I hear it's the slowest option?
@treakec: Thanks! The good old for loop with append is slightly slower than the equivalent list comprehension, but not much. However, for this application using append works out much faster than doing two list comprehensions, since the list comp versions have to scan and test everything twice: once for the derived list and once again for the therest list.
@treakec: And as I mentioned in my answer, using any on a generator expression returns True as soon as it finds a match, it only has to scan the whole list if it doesn't find a match.
0

The code you've written is almost correct. You need to check if 'Derived:' is present in the 3rd element of the list. k basically contains the first element of paragraphs3

>>> paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?']]
>>> paragraphs3[0]
['Page: 2', 'Bib: Something', 'Derived:  This n that']
>>> paragraphs3[0][2] # Here is where you want to check the condition
'Derived:  This n that'

So all you have to do is change the condition to if 'Derived:' in k[2].

>>> [k for k in paragraphs3 if 'Derived:' in k[2]]
[['Page: 2', 'Bib: Something', 'Derived:  This n that']]

>>> [k for k in paragraphs3 if 'Derived:' not in k[2]]
[['Page: 3', 'Bib: Something', "Argument: Wouldn't you like to know?"]]

Comments

0

Solution

derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]
therest = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' not in k, l))]

Detailed explanation

Copying the entire list:

[l for l in paragraph3]

Copying list with condition:

[l for l in paragraph3 if sublist_contains('Derived: ', l)]

Function sublist_contains is not implemented yet, so let's implement it.

Retrieve only items which match the condition_check:

filter(condition_check, l)

Since condition_check can be expressed as a lambda function:

filter(lambda k: 'Derived: ' in k, l)

Converting result to boolean (will be True if at least one item is found matching the condition):

any(filter(lambda k: 'Derived: ' in k, l))

And replacing sublist_contains with resulting inline code:

derived = [l for l in paragraphs3 if any(filter(lambda k: 'Derived: ' in k, l))]

Comments

0

It seems like your inner list has structure; the list itself is one value, not just a list of unrelated values. With that in mind, you could write a class to represent that data.

paragraphs3 = [['Page: 2', 'Bib: Something', 'Derived:  This n that'], ['Page: 3', 'Bib: Something', 'Argument: Wouldn\'t you like to know?'], ...]

class Paragraph(object):
    def __init__(self, page, bib, extra):
        self.page = page
        self.bib = bib
        self.extra = extra

    @property
    def is_derived(self):
        return 'Derived: ' in self.extra

paras = [Paragraph(p) for p in paragraphs3]

You can then use the partition recipe from itertools to split that one list into two iterators.

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

(not_derived_paras, derived_paras) = partition(lambda p: p.is_derived, paras)

Comments

0

This seems to me like the most straight-forward way of doing it:

[p for p in paragraphs3 if 'Derived:' in '\n'.join(p)]
[p for p in paragraphs3 if 'Derived:' not in '\n'.join(p)]

However, if you'd like, you can get a lot fancier, and pull this off in a single line (though it will be more complicated than necessary).

result = {k:[p for p in paragraphs3 if ('Derived:' in '\n'.join(p)) == test]  for k,test in {'derived': True, 'therest': False}.items()}

This produces a dict with 'derived' and 'therest' as keys. Now you can do this:

for k,p in result.items():
    print(k)
    for i in p:
        print(''.join(i))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.