1

I am trying to do a search between lists and return the value when it matches, and when it does not.

import re

array = ['brasil','argentina','chile','canada']
array2 = ['brasil.sao_paulo','chile','argentina']

for x,y in zip(array,array2):
  if re.search('\\{}\\b'.format(x), y, re.IGNORECASE):
    print("Match: {}".format(x))
  else:
    print("Not match: {}".format(y))

Output:

Not match: brasil.sao_paulo
Not match: chile
Traceback (most recent call last):
  File "main.py", line 7, in <module>
    if re.search('\\{}\\b'.format(x), y, re.IGNORECASE):
  File "/usr/local/lib/python3.7/re.py", line 183, in search
re.error: bad escape \c at position 0

Desired output:

Match: brasil
Match: argentina
Match: chile
Not match: canada
7
  • The regex which fails is \chile\b. I imagine that is not what you wanted to search for. Commented Dec 18, 2019 at 20:13
  • I would like it to look up regardless of array order Commented Dec 18, 2019 at 20:14
  • Example: Array 1 -> Line1 == Array2 -> All lines Commented Dec 18, 2019 at 20:16
  • What is the purpose of the initial `\\` ? Commented Dec 18, 2019 at 20:17
  • I'm new to regex, just trying to do it, suggest otherwise? Commented Dec 18, 2019 at 20:17

3 Answers 3

4

If I understand correctly, you don't need regex here.

group_1 = ['brasil','argentina','chile','canada']
group_2 = ['brasil.sao_paulo','chile','argentina']

for x in group_1:
    # For group 2 only, this picks out the part of the string that appears before the first ".".
  if x in [y.split('.')[0] for y in group_2]:
    print("Match: {}".format(x))
  else:
    print("Not match: {}".format(x))

which returns

Match: brasil
Match: argentina
Match: chile
Not match: canada
Sign up to request clarification or add additional context in comments.

Comments

2

If you zip, you'll only get pairwise matches. Given the nature of your search, you can just join up the haystack into a space-delimited string and join needles into a pattern with alternation and let findall chug away:

>>> import re
>>> needles = ['brasil', 'argentina', 'chile', 'canada']
>>> haystack = ['brasil.sao_paulo', 'chile', 'argentina']
>>> re.findall(r"\b%s\b" % "|".join(needles), " ".join(haystack), re.I)
['brasil', 'chile', 'argentina']

The intent behind \\ in the original regex is unclear, so I assume you want \b on both sides of the pattern.

Comments

1

A simple solution with the any method:

array = ['brasil', 'argentina', 'chile', 'canada']
array2 = ['brasil.sao_paulo', 'chile', 'argentina']

for x in array:
    if any(x.casefold() in y.casefold() for y in array2):
        print("Match:", x)
    else:
        print("Not match:", x)

Try it online!

Edit: Using casefold() to make it case-insensitive.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.