1

Let's assume I have two list of strings as follows.

lst_1 = ['foo','bar','Invoice No: SME2324-AA']
lst_2 = ['trincas','hotel park','delivery date 12-sept-2019','invoice no: 11245']

Objective: I want to extract the invoice number from these two lists.

My Approach so far:

lst_3 = [lst_1,lst_2]
txt=[]
for inv_no in lst_3:
    for i in inv_no:
         z = i
         inv = re.search(r'Invoice (\S+) (.+?)',' '.join(z))
         txt.append(inv)

When I wanted to see the output i.e. txtI am getting as

[None, None, None, None, None, None, None, None]

What I am looking for is

['SME2324-AA','11245']

What I am missing out here? Any help would be appreciated.

2
  • why do you do z=i and ' '.join(z)? why not just inv = re.search(r'Invoice (\S+) (.+?)', i)? That will get you closer to a solution Commented Sep 5, 2019 at 5:35
  • 2
    l3 = [*l1, *l2] Commented Sep 5, 2019 at 5:48

3 Answers 3

2

Without using regex, you can try in this way:

lst_3 = lst_1 + lst_2
txt=[]
for i in lst_3:
    if 'invoice' in i.lower():
        txt.append(i.split()[-1])
print (txt)

Output:

['SME2324-AA', '11245']
Sign up to request clarification or add additional context in comments.

2 Comments

Pretty neat, I tried this earlier. Got confused with [-1] part. Just to know, why are we using [-1]
split separates the string where there is a space. the [-1] takes the last of these parts
1

We can try joining your lists together to form a single string, then using re.findall to find all invoice numbers:

lst_1 = ['foo','bar','Invoice No: SME2324-AA']
lst_2 = ['trincas','hotel park','delivery date 12-sept-2019','invoice no: 11245']
lst_all = lst_1 + lst_2
inp = " ".join(lst_all)
invoices = re.findall(r'\binvoice no: (\S+)', inp, flags=re.IGNORECASE)
print(invoices)

This prints:

['SME2324-AA', '11245']

2 Comments

This only prints the first invoice number i.e. SME2324-AA
@pythondumb No, it prints every invoice number contained in your two lists. Maybe you need to reload your page?
1
  • First of all, ' '.join is making every element with white spaces in between.
  • Secondly, (.+?) stops with first match (i.e. non-greedy) and r'Invoice...' is bound to fail with lower-cased invoice.
  • Thirdly, append(inv) will not actually append the matching result; you need to specify group: if inv: text.append(inv.group(2)

Fixing all issues:

lst_3 = [lst_1,lst_2]
txt=[]
for inv_no in lst_3:
    for i in inv_no:
        z = i
        inv = re.search(r'[Ii]nvoice (\S+) (.+)',z)
        #                      group(1)^    ^group(2)
        if inv:
             txt.append(inv.group(2))
txt

Output:

['SME2324-AA', '11245']

You can make it simpler by using re.findall with re.IGNORECASE:

import re

res = []
for i in lst_1 + lst_2:
    res.extend(re.findall('invoice no: (.+)', i, re.IGNORECASE))
res

Output:

['SME2324-AA', '11245']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.