0

I am trying to check if the file name exists in a folder, for that I am storing expected file names in a list (expected_file_names) and actual file names are returned in another list (actual_file_names) using python. I am able to get the file names from the folder, but how do I iterate over each list item in actual_file_names and check if substring of it matches with another list item.

Goal

I am trying to get whether filename which starts with DMAMiddleware exists in a folder , actual filename is like DMAMiddleware10.20.20.jar . I want to check whether substring DMAMiddleware exists or not in a lists (expected & actual lists)

Problem

I am not clear on how to compare substrings in a list

Can someone provide me with an example or how this can be achieved. Thanks in advance.

actual_file_names = ['', 'python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                     'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase10.50.000.000.jar',
                     'DMAPremiumMiddleware10.50.000.000.jar', 'DMAPremiumUtilities10.50.000.000.jar',
                     'dma_oo_client_bin_linux.zip', 'dma_oo_client_bin_linux.zip.MD5', 'dma_oo_client_code_linux.zip',
                     'dma_oo_client_code_linux.zip.MD5', 'DCAFlowUtilities1.0.0.0.jar', 'DCAKafkaWriter1.0.0.0.jar',
                     'DCAUtilities1.0.0.0.jar']

expected_file_names = ['python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                       'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase.jar', 'DMAPremiumMiddleware.jar',
                       'DMAPremiumUtilities.jar']

for f in expected_file_names:
    for g in actual_file_names:
        if f in g:
            print "All file names exists in " + g

        else:
            print "file name "+g+" doesn't exists"
5
  • What's wrong with the code you posted? Commented Jul 31, 2017 at 10:06
  • I am not sure what you are trying to do exactly. Please be more clear, this is very confusing. What do you want to do and what are you getting instead? Commented Jul 31, 2017 at 10:10
  • I am trying to get whether filename which starts with DMAMiddleware exists in a folder , actual filename is like DMAMiddleware10.20.20.jar . I want to check whether substring DMAMiddleware exists or not .I hope this is clear , I am not clear on how to compare substrings in a list Commented Jul 31, 2017 at 10:16
  • 1
    If you want to compare only parts of the filenames, you need some kind of recipe how to find the part that of the filename -- for instance cut off after the first period (.) or erase all numbers, etc ... Commented Jul 31, 2017 at 10:17
  • @ThomasKühn can you please provide me an example or snippet on how to achieve this ? so that it will helpful for me. Commented Jul 31, 2017 at 10:23

2 Answers 2

2

Here is a way how to do it. The code includes two examples -- the first one truncates the filename after the first period (.), the second also removes all digits from the expected filename. With your input, the two examples have the same result.

import re

actual_file_names = ['', 'python', 'cmdb_dma_map.json', 'mappings.json', 
                     'vendor_provided_binaries.json',
                     'vendor_provided_binaries_custom.json', 
                     'DMAPremiumDatabase10.50.000.000.jar',
                     'DMAPremiumMiddleware10.50.000.000.jar', 
                     'DMAPremiumUtilities10.50.000.000.jar',
                     'dma_oo_client_bin_linux.zip', 
                     'dma_oo_client_bin_linux.zip.MD5', 
                     'dma_oo_client_code_linux.zip',
                     'dma_oo_client_code_linux.zip.MD5', 
                     'DCAFlowUtilities1.0.0.0.jar', 'DCAKafkaWriter1.0.0.0.jar',
                     'DCAUtilities1.0.0.0.jar']

expected_file_names = ['python', 'cmdb_dma_map.json', 'mappings.json', 
                       'vendor_provided_binaries.json',
                       'vendor_provided_binaries_custom.json', 
                       'DMAPremiumDatabase.jar', 'DMAPremiumMiddleware.jar',
                       'DMAPremiumUtilities.jar']


##compare everything after first period:
for expected in expected_file_names:
    part = expected.split('.',1)[0]
    ##print(part)
    matched = False
    for actual in actual_file_names:
        if part in actual:
            print('{} matches {}'.format(expected,actual))
            matched = True

    if not matched:
        print('{} could not be matched'.format(expected))

print('-'*50)

##remove also numbers
for expected in expected_file_names:
    part = re.sub('[0123456789]','',expected.split('.',1)[0])
    ##print(part)
    matched = False
    for actual in actual_file_names:
        if part in actual:
            print('{} matches {}'.format(expected,actual))
            matched = True

    if not matched:
        print('{} could not be matched'.format(expected))

The result is:

python matches python
cmdb_dma_map.json matches cmdb_dma_map.json
mappings.json matches mappings.json
vendor_provided_binaries.json matches vendor_provided_binaries.json
vendor_provided_binaries.json matches vendor_provided_binaries_custom.json
vendor_provided_binaries_custom.json matches vendor_provided_binaries_custom.json
DMAPremiumDatabase.jar matches DMAPremiumDatabase10.50.000.000.jar
DMAPremiumMiddleware.jar matches DMAPremiumMiddleware10.50.000.000.jar
DMAPremiumUtilities.jar matches DMAPremiumUtilities10.50.000.000.jar
--------------------------------------------------
python matches python
cmdb_dma_map.json matches cmdb_dma_map.json
mappings.json matches mappings.json
vendor_provided_binaries.json matches vendor_provided_binaries.json
vendor_provided_binaries.json matches vendor_provided_binaries_custom.json
vendor_provided_binaries_custom.json matches vendor_provided_binaries_custom.json
DMAPremiumDatabase.jar matches DMAPremiumDatabase10.50.000.000.jar
DMAPremiumMiddleware.jar matches DMAPremiumMiddleware10.50.000.000.jar
DMAPremiumUtilities.jar matches DMAPremiumUtilities10.50.000.000.jar

Tested on Python 3.5

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @ThomasKühn .This is what i was expecting.
I was thinking also of the first version, which can be accomplished without regex. Depending on the criteria, it might do the job well.
@cezar You are right. I only used re in the second example (also not strictly necessary), the first one only uses split.
0

I have given two ways of achieving this. First one is little bit complicated and the second one is traditional way of doing this.

actual_file_names = ['', 'python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                     'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase10.50.000.000.jar',
                     'DMAPremiumMiddleware10.50.000.000.jar', 'DMAPremiumUtilities10.50.000.000.jar',
                     'dma_oo_client_bin_linux.zip', 'dma_oo_client_bin_linux.zip.MD5', 'dma_oo_client_code_linux.zip',
                     'dma_oo_client_code_linux.zip.MD5', 'DCAFlowUtilities1.0.0.0.jar', 'DCAKafkaWriter1.0.0.0.jar',
                     'DCAUtilities1.0.0.0.jar']

expected_file_names = ['python', 'cmdb_dma_map.json', 'mappings.json', 'vendor_provided_binaries.json',
                       'vendor_provided_binaries_custom.json', 'DMAPremiumDatabase.jar', 'DMAPremiumMiddleware.jar',
                       'DMAPremiumUtilities.jar']

# 1'st way
print [str(afn)+" is valid" if any(efn.split(".")[0] in afn for efn in expected_file_names) else str(afn)+"N/A" for afn in actual_file_names]

# 2'nd way
for efn in expected_file_names:
    for afn in actual_file_names:
        if efn.split(".")[0] in afn:
            print afn

Output:

['N/A', 'python is valid', 'cmdb_dma_map.json is valid', 'mappings.json is valid', 'vendor_provided_binaries.json is valid', 'vendor_provided_binaries_custom.json is valid', 'DMAPremiumDatabase10.50.000.000.jar is valid', 'DMAPremiumMiddleware10.50.000.000.jar is valid', 'DMAPremiumUtilities10.50.000.000.jar is valid', 'dma_oo_client_bin_linux.zipN/A', 'dma_oo_client_bin_linux.zip.MD5N/A', 'dma_oo_client_code_linux.zipN/A', 'dma_oo_client_code_linux.zip.MD5N/A', 'DCAFlowUtilities1.0.0.0.jarN/A', 'DCAKafkaWriter1.0.0.0.jarN/A', 'DCAUtilities1.0.0.0.jarN/A']
python
cmdb_dma_map.json
mappings.json
vendor_provided_binaries.json
vendor_provided_binaries_custom.json
vendor_provided_binaries_custom.json
DMAPremiumDatabase10.50.000.000.jar
DMAPremiumMiddleware10.50.000.000.jar
DMAPremiumUtilities10.50.000.000.jar

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.