1

I have two lists, shortened for this example:

l1 = ['Chase Bank', 'Bank of America']

l2 = ['Chase Mobile: Bank & Invest', 'Elevations Credit Union Mobile']

I am trying to generate a list from l1 that is not in l2. In this case; 'Bank of America' would be the only item returned.

Chase Bank (from l1) and Chase Mobile: Bank & Invest (from l2) are the same because they both contain the keyword 'Chase', so they wouldn't go into the exclusion list. But Bank of America should go into the list, even though 'Bank' appears both in 'Bank of America' and 'Bank & Invest'.

I have tried using set, just a for loop with if/in as well as using any with a list comprehension. I have also tried regex, but matching the pattern of substrings from one list to the other is proving to be very difficult for me.

Is this possible with Python or should I broaden my approach?

1
  • First you need to decide the rules for how every possible bank should be matched from l1 to l2. This is not a programming problem, until you have defined the rules - then you can write a program to implement them. Commented May 31, 2022 at 17:43

3 Answers 3

2

Use list comprehension and re.sub to remove all undesired substrings from the elements of your first list. Here, I remove bank, case-insensitively, with optional whitespace before and after it. Then use another list comprehension, this time to remove everything that is found in the second list. Use enumerate to get both the index and the element from the list. Also, use sets, which is optional and makes the code faster for long and/or repetitive lists.

import re

lst1 = ['Chase Bank', 'Chase bank', 'Bank of America']
lst2 = ['Chase Mobile: Bank & Invest', 'Elevations Credit Union Mobile']
lst1_short = [re.sub(r'(?i)\s*\bbank\b\s*', '', s) for s in lst1]
print(lst1_short)
# ['Chase', 'Chase', 'of America']

lst1 = [s for i, s in enumerate(lst1) if
      not any(x for x in set(lst2) if lst1_short[i] in x)]
print(lst1)
# ['Bank of America']

Note: you can extend your list of stop words (here, only bank) using regular expressions. For example:

re.sub(r'(?i)\s*\b(bank|credit union|institution for savings)\b\s*', '', s)
Sign up to request clarification or add additional context in comments.

2 Comments

This is a nice answer to a bad question and I've upvoted it. But, for example, what if lst1 contains 'ECUM' for 'Elevations Credit Union Mobile' (or some other abbreviation)? Or what if lst1 contains "Bank Credit Union" - it will erroneously match "Elevations Credit Union Mobile". All I am saying is the OP's real problem is probably not solvable with a single approach and needs careful testing to avoid incorrect results.
I've upvoted this answer too, thanks everyone for the help, and I agree it was a bad question, hopefully in the sense that there may be more to the answer than one solution rather than how I presented it, but I also agree that using this a single approach will need to be tested. Thanks again!
0

You can do it with a list comprehension:

l2_chase = any('Chase' in j for j in l2)
[i for i in l1 if not ('Chase' in i and l2_chase)]

Output:

['Bank of America']

Comments

0

You should try something like this:

l1 = ['Chase Bank', 'Bank of America']
l2 = ['Chase Mobile: Bank & Invest', 'Elevations Credit Union Mobile']

def similar_substrings(l1, l2):
    word1 = [l1[i].split(" ") for i in range(len(l1))]
    word2 = [l2[i].split(" ") for i in range(len(l2))]
    words_in = []

    for string in l1:
        for string2 in l2:
            is_in = True
            for word in string:
                if word not in string2:
                    is_in = False
            if is_in:
                words_in.append(string)

    return words_in

print(similar_substrings(l1, l2))

I only checked if sentences from l2 were contained in l1 but you can modify it pretty easily to check both inclusions.

4 Comments

this outputs an empty list?
now it outputs ['Chase Bank'] when the desired output is ['Bank of America'] and word1 and word2 are not accessed.
It depends on if you’re looking for the list of the words to reject or to accept
well surely you're looking for the list of words to accept? why would you want a list of words to reject

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.