0

I have a list that holds names of files, some of which are almost identical except for their timestamp string section. The list is in the format of [name-subname-timestamp] for example:

myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']

What I need is a list that holds for every name and subname, the most recent file derived by the timestamp. I have started by creating a list that holds every [name-subname]:

name_subname_list = []
for row in myList:
    name_subname_list.append((row.rpartition('-')[0]))
name_subname_list = set(name_subname_list) # {'name1-001', 'name2-002', 'name1-002'}

Not sure if it is the right approach, moreover I am not sure how to continue. Any ideas?

2 Answers 2

1

This code is what you asked for:

For each name-subname, you will have the corresponding newest file:

from datetime import datetime as dt
dic = {}
for i in myList:
    sp = i.split('-')
    name_subname = sp[0]+'-'+sp[1]
    mytime = sp[2].split('.')[0]
    if name_subname not in dic:
        dic[name_subname] = mytime 
    else:
        if dt.strptime(mytime, "%Y%m%d%H%M") > dt.strptime(dic[name_subname], "%Y%m%d%H%M"):
            dic[name_subname] = mytime

result = []           
for name_subname in dic:
    result.append(name_subname+'-'+dic[name_subname]+'.txt')

which out puts resutl to be like:

['name1-001-202112021010.txt',
 'name1-002-202112021010.txt',
 'name2-002-202112020811.txt']
Sign up to request clarification or add additional context in comments.

2 Comments

the other one gives you wrong result for name-subname of 'name1-001'
Thank you, I guess that this is a safer approach than casting the list into integers in order to correct the max() issue.
1

Try this:

myList = ['name1-001-20211202811.txt', 'name1-001-202112021010.txt', 'name1-002-202112021010.txt', 'name2-002-202112020811.txt']
dic = {}
for name in myList:
    parts = name.split('-')
    dic.setdefault(parts[0] + '-' + parts[1], []).append(parts[2])

unique_list = []
for key,value in dic.items():
    unique_list.append(key + '-' + max(value))

2 Comments

Thank you very much, its almost perfect. The only thing is that the max() function is strange, for example max(['20211202811', '202112021010']) returns 20211202811 which is not true (I assume it ignores the zero at the end). I will fix it by converting the list to ints. Thanks
Your welcome. When using the max() function with strings, the length of items is not important, but the result of comparison between the first non-matching items will determine the final result. For example max('9', '899') will return '9'! or max('1234', '1233666') will return '1234'! So in your example, if you use the complete format for date and time (I mean '0811' instead of '811'), the max() function will behave in the expected way. You can also convert your DateTime strings to datetime objects and then apply the comparison as Fatemeh-Sangin is mentioned in her answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.