Python list data filtering

Question

I have a list that holds names of files, some of which are almost identical except for their timestamp string section. The list is in the format of [name-subname-timestamp] for example:

myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']

What I need is a list that holds for every name and subname, the most recent file derived by the timestamp. I have started by creating a list that holds every [name-subname]:

name_subname_list = []
for row in myList:
    name_subname_list.append((row.rpartition('-')[0]))
name_subname_list = set(name_subname_list) # {'name1-001', 'name2-002', 'name1-002'}

Not sure if it is the right approach, moreover I am not sure how to continue. Any ideas?

Fatemeh Sangin · Accepted Answer · 2021-12-02 10:54:11Z

1

This code is what you asked for:

For each name-subname, you will have the corresponding newest file:

from datetime import datetime as dt
dic = {}
for i in myList:
    sp = i.split('-')
    name_subname = sp[0]+'-'+sp[1]
    mytime = sp[2].split('.')[0]
    if name_subname not in dic:
        dic[name_subname] = mytime 
    else:
        if dt.strptime(mytime, "%Y%m%d%H%M") > dt.strptime(dic[name_subname], "%Y%m%d%H%M"):
            dic[name_subname] = mytime

result = []           
for name_subname in dic:
    result.append(name_subname+'-'+dic[name_subname]+'.txt')

which out puts resutl to be like:

['name1-001-202112021010.txt',
 'name1-002-202112021010.txt',
 'name2-002-202112020811.txt']

answered Dec 2, 2021 at 10:54

Fatemeh Sangin

5511 gold badge5 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Fatemeh Sangin Over a year ago

the other one gives you wrong result for name-subname of 'name1-001'

Kahalon Over a year ago

Thank you, I guess that this is a safer approach than casting the list into integers in order to correct the max() issue.

Mohsen Karimi · Accepted Answer · 2021-12-02 10:20:02Z

1

Try this:

myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
dic = {}
for name in myList:
    parts = name.split('-')
    dic.setdefault(parts[0] + '-' + parts[1], []).append(parts[2])

unique_list = []
for key,value in dic.items():
    unique_list.append(key + '-' + max(value))

answered Dec 2, 2021 at 10:20

Mohsen Karimi

1295 bronze badges

2 Comments

Kahalon Over a year ago

Thank you very much, its almost perfect. The only thing is that the max() function is strange, for example max(['20211202811', '202112021010']) returns 20211202811 which is not true (I assume it ignores the zero at the end). I will fix it by converting the list to ints. Thanks

Mohsen Karimi Over a year ago

Your welcome. When using the max() function with strings, the length of items is not important, but the result of comparison between the first non-matching items will determine the final result. For example max('9', '899') will return '9'! or max('1234', '1233666') will return '1234'! So in your example, if you use the complete format for date and time (I mean '0811' instead of '811'), the max() function will behave in the expected way. You can also convert your DateTime strings to datetime objects and then apply the comparison as Fatemeh-Sangin is mentioned in her answer.

Collectives™ on Stack Overflow

Python list data filtering

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related