-4
listA = ['Leonardo_da_Vinci', 'Napoleon', 'Cao_Cao', 'Elton_John']
listB = ['123_Leonardo_da_Vinci_abc.csv', '456_Cao_Cao_def.csv']

listC = ['Napoleon', 'Elton_John']

I would like to check if the items in listB contain the values in listA, and returns listC (i.e. the list that is missing in listB). For this purpose I would like to use regex (and no other heuristics) like (example of checking Leonardo_da_Vinci): .*Leonardo_da_Vinci.*

The reason for regex is that the example above is just the simplest mockup and the data I use is much bigger. It is also good to have a generalised code which works on another data in future.

3
  • Is there a specific reason why you want to use regex? It's overkill for your use case and will make the code more complicated than simple in checks with python Commented Mar 6, 2023 at 14:00
  • The reason for regex is that the example above is just the simplest mockup and the data I use is much bigger. It is also good to have a generalised code which works on another data in future. Commented Mar 6, 2023 at 15:22
  • Alright, that makes more sense. I'd still be careful about using a dynamically generated regex like this. It's just incredible slow compared to alternative solutions. At least if you do have a big amount of data. Otherwise you'll be fine. Commented Mar 7, 2023 at 14:21

1 Answer 1

2

Something like this:

import re

def exists_csv_with_name(name:str, source_list: list) -> bool:
    regex = re.compile(fr'.*{name}.*')
    return any(regex.match(source_str) for source_str in source_list)
    
listC = [name for name in listA if not exists_csv_with_name(name, listB)]
Sign up to request clarification or add additional context in comments.

5 Comments

If there are alternative ways, you are more than welcome! Thanks in advance
@user7665853 In what sense alternative ways? What in this solution does not fully satisfy you?
It was supposed to ask more volunteers to get other nice-to-have solutions. Now, I tested a bit further, and it seems I have a problem with values in UTF-8. Somehow values like Frédéric_Chopin was excluded from the outcome. Is there a way to indicate it in your code? (although, I am not entirely sure if the encoding is the exact problem)
If I understood you right, when you have listA = ['Frédéric_Chopin'] and listB = ['110_Frédéric_Chopin.csv'], then 'Frédéric_Chopin' is not in listC. That is exactly the behavior you asked for!
Sorry, I think this is another problem for UTF-8 encoding in my machine. Something is wrong with matching the filename in UTF-8, and I am just puzzled...I think your code is fine for the original purpose.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.