1

I have a string from which I wish to extract a value. The string is: 'data - hk = "136 HK"'. I know, that data - hk = will always precede the value, hence I can use simple python str.split(). It is not a neat solution. I would like to get more familiar with regex, but my attempts have so far failed.

Here is what I have tried:

        text = 'data - hk = "136 HK"'
        
        # With simple text split - returns 136
        int(text.split("data - hk = ")[1].split('"')[1].split(" HK")[0])

        # With regex - returns nothing
        re.search("[\n\r].*data - hk:\s*([^\n\r]*)", str(text))

Can someone guide me to, what I need to change in the regex?

4
  • 2
    It could be something like re.findall(r'^data - hk = "(\d+) HK"', text). But would the captured substring always be an integer? Looks like it by your own attempt at split. Commented Apr 10, 2021 at 5:39
  • Thank you, yes, it will always be an integer Commented Apr 10, 2021 at 5:51
  • I forgot the end string anchor in my previous comment. @Rva92. Did it end up working for you? Commented Apr 10, 2021 at 5:52
  • 1
    yes it did, thank you very much. Post it as answer and I'll mark it Commented Apr 10, 2021 at 5:54

2 Answers 2

2

It seems as though you can just validate your string and extract your integer using re.findall:

re.findall(r'^data - hk = "(\d+) HK"$', text)

See an online demo

  • ^ - Start line anchor.
  • data - hk = " - A literal match for mentioned string.
  • (\d+) - A capture group to retrieve the integer of 1+ digits.
  • HK" - Another literal match for mentioned string.
  • $ - End line anchor.
Sign up to request clarification or add additional context in comments.

Comments

1

Use

data - hk\D*(\d+(?:\.\d+)?)

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  data - hk                'data - hk'
--------------------------------------------------------------------------------
  \D*                      non-digits (all but 0-9) (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      \.                       '.'
--------------------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1

Code:

import re
s = 'data - hk = "136 HK"'
match = re.search(r'data - hk\D*(\d+(?:\.\d+)?)', s)
if match is not None:
    print(match.group(1))

Results: 136.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.