0

I have a list of strings:

fileList = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

and I'd like to confirm that there is both a Run.1-Final and Run.2-Initial for each date.

I've tried something like:

for i in range(len(directoryList)):
    if directoryList[i][5:15] != directoryList[i + 1][5:15]:
        print(directoryList[i] + ' is missing.')
    i += 2

and I'd like the output to be

'YMML.2019.09.14-Run.2-Initial.pdf is missing,

Perhaps something like

dates = [directoryList[i][5:15] for i in range(len(directoryList))]
counter = collections.Counter(dates)

But then having trouble extracting from the dictionary.

3
  • No, file list isn't always sorted. I had this thought as I posted. See edit. Commented Sep 20, 2019 at 15:52
  • Your method is almost finished. Just filter the dates from counter where the count is not 2. For example: [d for d, cnt in counter.items() if cnt < 2] Commented Sep 20, 2019 at 15:59
  • @pault Yeah, OP was close and good point, but restoring the full original file name is still a bit problematic. Commented Sep 20, 2019 at 16:15

4 Answers 4

1

To make it more readable, you could create a list of dates first, then loop over those.

file_list = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

dates = set([item[5:15] for item in file_list])

for date in dates:
   if 'YMML.' + date + '-Run.1-Final.pdf' not in file_list:
      print('YMML.' + date + '-Run.1-Final.pdf is missing')
   if 'YMML.' + date + '-Run.2-Initial.pdf' not in file_list:
      print('YMML.' + date + '-Run.2-Initial.pdf is missing')

set() takes the unique values in the list to avoid looping through them all twice.

Sign up to request clarification or add additional context in comments.

2 Comments

Just FYI, this is O(n^2) because of not in file_list, which does a linear search over all of the original files. You could create a second set for lookups.
Right you are @ggorlen! Thanks for the tip.
1

I'm kind of late but here's what i found to be the simplest way, maybe not the most efficent :

for file in fileList:
    if file[20:27] == "1-Final":
        if (file[0:20] + "2-Initial.pdf") not in fileList:
            print(file)
    elif file[19:29] is "2-Initial.pdf":
        if (file[0:20] + "1-Final.pdf") not in fileList:
            print(file)

Comments

1

Here's an O(n) solution which collects items into a defaultdict by date, then filters on quantity seen, restoring original names from the remaining value:

from collections import defaultdict

files = [
    'YMML.2019.09.10-Run.1-Final.pdf',
    'YMML.2019.09.10-Run.2-Initial.pdf',
    'YMML.2019.09.11-Run.2-Initial.pdf',
    'YMML.2019.09.11-Run.1-Final.pdf',
    'YMML.2019.09.12-Run.2-Initial.pdf',
    'YMML.2019.09.13-Run.2-Initial.pdf',
    'YMML.2019.09.12-Run.1-Final.pdf',
    'YMML.2019.09.13-Run.1-Final.pdf',
    'YMML.2019.09.14-Run.1-Final.pdf',
]

seen = defaultdict(list)

for x in files:
    seen[x[5:15]].append(x)

missing = [v[0] for k, v in seen.items() if len(v) < 2]
print(missing) # => ['YMML.2019.09.14-Run.1-Final.pdf']

Getting names of partners can be done with a conditional:

names = [
    x[:20] + "2-Initial.pdf" if x[20] == "1" else
    x[:20] + "1-Final.pdf" for x in missing
]
print(names) # => ['YMML.2019.09.14-Run.2-Initial.pdf']

Comments

0

This works:

fileList = ['YMML.2019.09.10-Run.1-Final.pdf',
            'YMML.2019.09.10-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.2-Initial.pdf',
            'YMML.2019.09.11-Run.1-Final.pdf',
            'YMML.2019.09.12-Run.2-Initial.pdf',
            'YMML.2019.09.13-Run.2-Initial.pdf',
            'YMML.2019.09.12-Run.1-Final.pdf',
            'YMML.2019.09.13-Run.1-Final.pdf',
            'YMML.2019.09.14-Run.1-Final.pdf',]

initial_set = {filename[:15] for filename in fileList if 'Initial' in filename}
final_set = {filename[:15] for filename in fileList if 'Final' in filename}

for filename in final_set - initial_set:
    print(filename + '-Run.2-Initial.pdf is missing.')
for filename in initial_set - final_set:
    print(filename + '-Run.1-Final.pdf is missing.')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.