2

I am currently doing some list of comprehension and come across a problem while increasing the number of loops in it. My code so far is as following:

selected_sheet_names = []
selected_sheet_names.append([x for x in sheet_names if x.endswith("b1")])
selected_sheet_names.append([x for x in sheet_names if x.endswith("b2")])
selected_sheet_names.append([x for x in sheet_names if x.endswith("b3")])

sheet_names list contains different strings all of which end with b1, b2, or b3. If you want to check them in your code:

sheet_names = ['0.5C_1_b1', '0.5C_2_b1', '1C_1_b1', '1C_2_b1', '1C_3_b1', '1C_4_b1', '1C_5_b1', 
'0.11C_1_b2', '0.57C_1_b2', '1.14C_1_b2', '1.14C_2_b2', '1.14C_3_b2', '1.14C_4_b2', '1.14C_5_b2', 
'1.14C_6_b2', '1.14C_7_b2', '1.14C_8_b2', '1C_1_b3', '1C_2_b3', '1C_3_b3', '1C_4_b3', '1C_5_b3', 
'1C_6_b3', '1C_7_b3', '1C_8_b3']

And if I want to print(selected_sheet_names) the results is as following:

[
    ['0.5C_1_b1', '0.5C_2_b1', '1C_1_b1', '1C_2_b1', '1C_3_b1', '1C_4_b1', '1C_5_b1'], 
    ['0.11C_1_b2', '0.57C_1_b2', '1.14C_1_b2', '1.14C_2_b2', '1.14C_3_b2', '1.14C_4_b2', '1.14C_5_b2', '1.14C_6_b2', '1.14C_7_b2', '1.14C_8_b2'], 
    ['1C_1_b3', '1C_2_b3', '1C_3_b3', '1C_4_b3', '1C_5_b3', '1C_6_b3', '1C_7_b3', '1C_8_b3']
]

Exactly as I expected, but in case I want to have more x.endswith(some_string) as in the first code block, the code becomes too massive and, therefore, I think I should try to change the selected_sheet_names.append([x for x in sheet_names if x.endswith(some_string)]) which repeats many times to some other more complicated list comprehension which could iterate over some_list and do the same.

some_list = ["b1", "b2", "b3" ... ]

Could someone please suggest me something?

EDIT 1: I know that I can implement it with for loop, but in this example I am specifically interested in list of comprehension implementation, if possible. The for loop can be as following:

selected_sheet_names = []
for ending in some_list:
    selected_sheet_names.append([x for x in sheet_names if x.endswith(ending)])

EDIT 2 (Thanks to Pedro Maia):

If the data is contiguous (, but it is not my case) you can go with:

from itertools import groupby

selected_sheet_names = [list(l[1]) for l in groupby(sheet_names, lambda x: x[-2:])]

My bad that I showed you a list to be contiguous. In case your data is not contiguous, the output may look something like this:

[
    ['0.11C_1_b2'], 
    ['0.5C_1_b1'], 
    ['0.57C_1_b2'], 
    ['0.5C_2_b1', '1C_1_b1', '1C_2_b1', '1C_3_b1', '1C_4_b1', '1C_5_b1'], 
    ['1.14C_1_b2', '1.14C_2_b2', '1.14C_3_b2', '1.14C_4_b2', '1.14C_5_b2', '1.14C_6_b2', '1.14C_7_b2', '1.14C_8_b2'], 
    ['1C_1_b3', '1C_2_b3', '1C_3_b3', '1C_4_b3', '1C_5_b3', '1C_6_b3', '1C_7_b3', '1C_8_b3']
]

However, if you data IS contiguous, this method seems better

Thanks you guys for the replies!

2 Answers 2

6

Simple nested listcomp matching your suggested form would loop over an anonymous tuple of the strings to check for:

selected_sheet_names = [[x for x in sheet_names if x.endswith(some_string)]
                        for some_string in ("b1", "b2", "b3")]

If you get some_list from somewhere else, or it gets too long to comfortably define inline, you can replace the anonymous tuple with some_list if it's already defined.

Sign up to request clarification or add additional context in comments.

5 Comments

In addition to this answer, I would like to mention that you could have looped.
@Tarik: Sure. Every listcomp can be transformed to an equivalent for loop, but in this case, may as well just build it all at once. And apparently the OP just clarified they want the listcomp, so all's well that ends well.
@ShadowRanger That worked excellent for me! Thank you very much!
@Tarik Thanks for your comment! That's actually true, I have included possible loop implementation in the question during the last edit!
Sure ShadowRanger, your answer is obviously more elegant that a loop. I just wanted the OP to be aware of it.
3

Alternatively you can use groupby from the built-in itertools module:

from itertools import groupby

selected_sheet_names = [list(l[1]) for l in groupby(sheet_names, lambda x: x[-2:])]

Which provides a cleaner and better performance code since you don't iterate multiple unnecessary times

2 Comments

This assumes the entries to group are, and always will be, contiguous, and you want all of them (or can use the key to easily identify the ones to discard). That said, yes, this is the better solution if the number of endings to handle gets large enough (current implementation is O(n * m) where n is len(sheet_names) and m is len(some_list); your implementation, if it required a pre-sort, is O(n log n), or O(n) if it's assumed sorted). Up-voted regardless.
Thanks for your reply! As mentioned by @ShadowRanger it is a good way to go with if the data is contiguous. Actually, it is my bad that I shared data structured contiguously in the sheet_names, whereas in reality the entries are not contiguous. I am adding an edit with your code! Thank you very much again! Seems a good approach as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.