5

I have a array contains file names like below:

['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png', ....]

I want to quickly group these files into multiple arrays like this:

[['001_1.png', '001_2.png', '001_3.png'], ['002_1.png', '002_2.png'], ['003_1.png', '003_2.png', '003_3.png', '003_4.png'], ...]

Could anyone tell me how to do it in few lines in python?

3
  • 2
    In your desired output, should the third element be 001_3.png? Commented May 4, 2018 at 7:37
  • Is it always like this, I mean ordered ? Commented May 4, 2018 at 7:37
  • The third one should be 001_3.png, right? Commented May 4, 2018 at 7:37

6 Answers 6

6

If your data is already sorted by the file name, you can use itertools.groupby:

files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
        '003_1.png', '003_2.png', '003_3.png']

import itertools

keyfunc = lambda filename: filename[:3]

# this creates an iterator that yields `(group, filenames)` tuples,
# but `filenames` is another iterator
grouper = itertools.groupby(files, keyfunc)

# to get the result as a nested list, we iterate over the grouper to
# discard the groups and turn the `filenames` iterators into lists
result = [list(files) for _, files in grouper]

print(list(result))
# [['001_1.png', '001_2.png', '001_3.png'],
#  ['002_1.png', '002_2.png'],
#  ['003_1.png', '003_2.png', '003_3.png']]

Otherwise, you can base your code on this recipe, which is more efficient than sorting the list and then using groupby.

  • Input: Your input is a flat list, so use a regular ol' loop to iterate over it:

    for filename in files:
    
  • Group identifier: The files are grouped by the first 3 letters:

    group = filename[:3]
    
  • Output: The output should be a nested list rather than a dict, which can be done with

    result = list(groupdict.values())
    

Putting it together:

files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
        '003_1.png', '003_2.png', '003_3.png']

import collections

groupdict = collections.defaultdict(list)
for filename in files:
    group = filename[:3]
    groupdict[group].append(filename)

result = list(groupdict.values())

print(result)
# [['001_1.png', '001_2.png', '001_3.png'],
#  ['002_1.png', '002_2.png'],
#  ['003_1.png', '003_2.png', '003_3.png']]

Read the recipe answer for more details.

Sign up to request clarification or add additional context in comments.

2 Comments

Beatiful answer and upvote. I know the feeling when write a long and correct answer and nobody upvote it in order to recognize the invested time and effort into answering it.
@MihaiAlexandru-Ionut Thanks :) I couldn't stand the low text/code ratio in this thread, so I decided to do something about it. Your answer was the only one that had a significant amount of explanation.
4

Something like that should work:

import itertools


mylist = [...]
[list(v) for k,v in itertools.groupby(mylist, key=lambda x: x[:3])]

If input list isn't sorted, than use something like that:

import itertools


mylist = [...]
keyfunc = lambda x:x[:3]
mylist = sorted(mylist, key=keyfunc)
[list(v) for k,v in itertools.groupby(mylist, key=keyfunc)]

Comments

1

You can do it using a dictionary.

list = ['001_1.png', '001_2.png', '003_3.png', '002_1.png', '002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']

dict = {}
for item in list:
  if item[:3] not in dict:
    dict[item[:3]] = []
  dict[item[:3]].append(item)

Then you have to sort the dictionary by key value.

dict = {k:v for k,v in sorted(dict.items())}

The last step is to use a list comprehension in order to achieve your requirement.

list = [v for k,v in dict.items()]
print(list)

Output

[['001_1.png', '001_2.png'], ['002_1.png', '002_2.png'], ['003_3.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']]

Comments

0

Using a simple iteration and dictionary.

Ex:

l = ['001_1.png', ' 001_2.png', ' 003_3.png', ' 002_1.png', ' 002_2.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png']
r = {}
for i in l:
    v = i.split("_")[0][-1]
    if v not in r:
        r[v] = []
    r[v].append(i)
print(r.values())

Output:

[['001_1.png', ' 001_2.png'], [' 003_3.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png'], [' 002_1.png', ' 002_2.png']]

Comments

0

If your list is ordered like this here is a short script for this task.

myList = []
for i in a:
    if i[:-4].endswith('1'):
        myList.append([i])
    else:
        myList[-1].append(i)

# [['001_1.png', '001_2.png', '003_3.png'], ['002_1.png', '002_2.png'], ...]

Comments

0
#IYN

mini_list = []
p = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
new_p = []

for index, element in enumerate(p):
    if index == len(p)-1:
        mini_list.append(element)
        new_p.append(mini_list)
        break

    if element[0:3]==p[index+1][0:3]:
        mini_list.append(element)

    else:
        mini_list.append(element)
        new_p.append(mini_list)
        mini_list = []

print (new_p)

The code above will cut the initial list into sub lists and append them as individual lists into a resulting, larger list. Note: not a few lines, but you can convert this to a function.

def list_cutter(ls):
    mini_list = []
    new_list = []

    for index, element in enumerate(ls):
        if index == len(ls)-1:
            mini_list.append(element)
            new_list.append(mini_list)
            break

        if element[0:3]==ls[index+1][0:3]:
            mini_list.append(element)

        else:
            mini_list.append(element)
            new_list.append(mini_list)
            mini_list = []

    return new_list

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.