2

I'm trying to get the first number (int and float) after a specific pattern:

strings = ["Building 38 House 10",
           "Building : 10.5 house 900"]
for x in string:
    print(<rule>)

Wanted result:

'38'
'10.5'

I tried:

for x in strings:
    print(re.findall(f"(?<=Building).+\d+", x))
    print(re.findall(f"(?<=Building).+(\d+.?\d+)", x))
[' 38 House 10']
['10']
[' : 10.5 house 900']
['00']

But I'm missing something.

1

3 Answers 3

2

You could use a capture group:

\bBuilding[\s:]+(\d+(?:\.\d+)?)\b

Explanation

  • \bBuilding Match the word Building
  • [\s:]+ Match 1+ whitespace chars or colons
  • (\d+(?:\.\d+)?) Capture group 1, match 1+ digits with an optional decimal part
  • \b A word boundary

Regex demo

import re
strings = ["Building 38 House 10",
           "Building : 10.5 house 900"]
pattern = r"\bBuilding[\s:]+(\d+(?:\.\d+)?)"
for x in strings:
    m = re.search(pattern, x)
    if m:
        print(m.group(1))

Output

38
10.5
Sign up to request clarification or add additional context in comments.

Comments

1

An idea to use \D (negated \d) to match any non-digits in between and capture the number:

Building\D*\b([\d.]+)

See this demo at regex101 or Python demo at tio.run

Just to mention, use word boundaries \b around Building to match the full word.

Comments

0
re.findall(r"(?<![a-zA-Z:])[-+]?\d*\.?\d+", x)

This will find all numbers in the given string.

If you want the first number only you can access it simply through indexing:

re.findall(r"(?<![a-zA-Z:])[-+]?\d*\.?\d+", x)[0]

2 Comments

That would include 10 and 900, which OP doesn't want.
You are right, edited the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.