Python: How to extract a string right after another specified string

Question

Let's assume I have two list of strings as follows.

lst_1 = ['foo','bar','Invoice No: SME2324-AA']
lst_2 = ['trincas','hotel park','delivery date 12-sept-2019','invoice no: 11245']

Objective: I want to extract the invoice number from these two lists.

My Approach so far:

lst_3 = [lst_1,lst_2]
txt=[]
for inv_no in lst_3:
    for i in inv_no:
         z = i
         inv = re.search(r'Invoice (\S+) (.+?)',' '.join(z))
         txt.append(inv)

When I wanted to see the output i.e. txtI am getting as

[None, None, None, None, None, None, None, None]

What I am looking for is

['SME2324-AA','11245']

What I am missing out here? Any help would be appreciated.

why do you do z=i and ' '.join(z)? why not just inv = re.search(r'Invoice (\S+) (.+?)', i)? That will get you closer to a solution — Tomerikoo
– Tomerikoo, Commented Sep 5, 2019 at 5:35

Joe · Accepted Answer · 2019-09-05 05:48:59Z

2

Without using regex, you can try in this way:

lst_3 = lst_1 + lst_2
txt=[]
for i in lst_3:
    if 'invoice' in i.lower():
        txt.append(i.split()[-1])
print (txt)

Output:

['SME2324-AA', '11245']

answered Sep 5, 2019 at 5:48

Joe

12.4k7 gold badges44 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pythondumb Over a year ago

Pretty neat, I tried this earlier. Got confused with [-1] part. Just to know, why are we using [-1]

Joe Over a year ago

split separates the string where there is a space. the [-1] takes the last of these parts

Tim Biegeleisen · Accepted Answer · 2019-09-05 05:36:43Z

1

We can try joining your lists together to form a single string, then using re.findall to find all invoice numbers:

lst_1 = ['foo','bar','Invoice No: SME2324-AA']
lst_2 = ['trincas','hotel park','delivery date 12-sept-2019','invoice no: 11245']
lst_all = lst_1 + lst_2
inp = " ".join(lst_all)
invoices = re.findall(r'\binvoice no: (\S+)', inp, flags=re.IGNORECASE)
print(invoices)

This prints:

['SME2324-AA', '11245']

answered Sep 5, 2019 at 5:36

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

2 Comments

pythondumb Over a year ago

This only prints the first invoice number i.e. SME2324-AA

Tim Biegeleisen Over a year ago

@pythondumb No, it prints every invoice number contained in your two lists. Maybe you need to reload your page?

Chris · Accepted Answer · 2019-09-05 05:41:05Z

First of all, ' '.join is making every element with white spaces in between.
Secondly, (.+?) stops with first match (i.e. non-greedy) and r'Invoice...' is bound to fail with lower-cased invoice.
Thirdly, append(inv) will not actually append the matching result; you need to specify group: if inv: text.append(inv.group(2)

Fixing all issues:

lst_3 = [lst_1,lst_2]
txt=[]
for inv_no in lst_3:
    for i in inv_no:
        z = i
        inv = re.search(r'[Ii]nvoice (\S+) (.+)',z)
        #                      group(1)^    ^group(2)
        if inv:
             txt.append(inv.group(2))
txt

Output:

['SME2324-AA', '11245']

You can make it simpler by using re.findall with re.IGNORECASE:

import re

res = []
for i in lst_1 + lst_2:
    res.extend(re.findall('invoice no: (.+)', i, re.IGNORECASE))
res

Output:

['SME2324-AA', '11245']

Collectives™ on Stack Overflow

Python: How to extract a string right after another specified string

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related