0

I am interested in extracting a number that appears after a set of characters ('AA='). However, the issue is: (i) I am not aware how long the number is, (ii) I don't know what appears right after the number (could be a blank space or ANY character except 0-9, consider that I do not know what these characters could be but they are definitely not 0-9), (iii) number can be present in exponential form (line 4/5 below)

Given below are few of many inputs that I can have.

Line 1: 123 NUBA AA=1.2345 $BB=1234.55
Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55
Line 3: 123 NUBA RRNJH AA=1.2#ALPHA
Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO
Line 5: 123 NUBA ABCD AA=1.245E-7$ MN
...

The result should be: 1.2345 1.2345678 1.2 1.2e-5 1.245e-7 for each respective line above.

PS: I know how to use .find and get the starting location of AA= but that is not very helpful for the above conditions. Also, I understand one way could be to loop through each character after after AA= and break if a blank space or anything except [0-9,., E, -] is seen, but that is clumsy and takes unnecessary space in my code. I am looking for a more neat way of doing this.

2
  • The neat way is to use a regular expression, that's what they were invented for. Start with the re module. Commented Jan 15, 2021 at 22:55
  • @MarkRansom: Thanks, can you please share a simple relevant example? Commented Jan 15, 2021 at 23:01

2 Answers 2

2

You could use a single pattern with a capture group. Use re.findall for example to get the value of the capture group only.

\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)

Explanation

  • \bAA= A word boundary, then match AA=
  • ( Capture group 1
    • \d+ Match 1+ digits
    • (?:\.\d+)? Match an optional decimal part
    • (?:[eE][-+]?[0-9]+)? Match an optional exponential part
  • ) Close group 1

Regex demo

import re
 
regex = r"\bAA=(\d+(?:\.\d+)?(?:[eE][-+]?[0-9]+)?)"
 
s = ("Line 1: 123 NUBA AA=1.2345 $BB=1234.55\n"
    "Line 2: 123 NUBA MM AA=1.2345678&BB=1234.55\n"
    "Line 3: 123 NUBA RRNJH AA=1.2#ALPHA\n"
    "Line 4: 123 NUBA ABCD AA=1.2E-5 GBRO\n"
    "Line 5: 123 NUBA ABCD AA=1.245E-7$ MN")
 
print(re.findall(regex, s))

Output

['1.2345', '1.2345678', '1.2', '1.2E-5', '1.245E-7']

Python demo

Sign up to request clarification or add additional context in comments.

2 Comments

Interesting! Thanks! I am stuck at some other places which I forgot to include in my question before. Please see my updated question. How Line 4 and 5 can be handled as well please?
Thanks for updating your answer. That explanation is very exhaustive and helpful for someone who has never used regex before. Thanks!!
1

This will give you the output you want

import re

string1 = '123 NUBA AA=1.2345 $BB=1234.55'
string2 = '123 NUBA MM AA=1.2345678&BB=1234.55'
string3 = '123 NUBA RRNJH AA=1.2#ALPHA'

re.findall(r'\d+\.*\d*', string1[string1.find("AA="):])[0]
re.findall(r'\d+\.*\d*', string2[string2.find("AA="):])[0]
re.findall(r'\d+\.*\d*', string3[string3.find("AA="):])[0]

Output

1.2345
1.2345678
1.2

1 Comment

This works! But I am stuck at some other places which I forgot to include in my question before. Please see my updated question. How Line4 and 5 can be handled?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.