1

Let's import a regex.

import re

Assume there's a string containing some data.

data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
   Rob: Jan 22.3, Feb 20.0, Mar 22.0
   Nick: Jan 23.4, Feb 22.0, Mar 23.4'''

For example, we want to extract floats for Rob's line only.

name = 'Rob'

I'd make it like this:

def data_extractor(name, data):
    return re.findall(r'\d+\.\d+', re.findall(r'{}.*'.format(name),data)[0])

The output is ['22.3', '20.0', '22.0'].

Is my way pythonic or it should be improved somehow? It does the job, but I'm not certain about appropriateness of such code.

Thanks for your time.

3
  • 3
    For me personally, I'd put the re.findall s on separate lines. First sets a value, second uses that value. Sure you can one line it, but for down the road reading I like it a little more explicit. Just my 2 cents Commented Jul 25, 2017 at 15:31
  • A possible problem is that each time data_extractor() is called it searches data from the beginning for the name. If it's an ad hoc query for a few arbitrary names, that's ok. But if you will be using all the names, this is not efficient, because it runs through the same text territory every time. Commented Jul 25, 2017 at 17:08
  • Also, pythex is a good tool for testing python regex: pythex.org Commented Jul 25, 2017 at 21:31

1 Answer 1

1

A non-regex way consists in splitting the lines and trimming them, and then checking which one starts with Rob and then grab all the float values:

import re
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
   Rob: Jan 22.3, Feb 20.0, Mar 22.0
   Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
name = 'Rob'
lines = [line.strip() for line in data.split("\n")]
for l in lines:
    if l.startswith(name):
        print(re.findall(r'\d+\.\d+', l))
# => ['22.3', '20.0', '22.0']

See a Python demo

If you want to use a purely regex way, you may use a PyPi regex module with a \G based regex:

import regex
data = '''Mike: Jan 25.1, Feb 24.3, Mar 29.0
   Rob: Jan 22.3, Feb 20.0, Mar 22.0
   Nick: Jan 23.4, Feb 22.0, Mar 23.4'''
name = 'Rob'
rx = r'(?:\G(?!\A)|{}).*?(\d+\.\d+)'.format(regex.escape(name))
print(regex.findall(rx, data))

See the online Python demo

This pattern matches:

  • (?:\G(?!\A)|{}) - the end of the last successful match or the name contents
  • .*? - any 0+ chars other than line break chars, as few as possible
  • (\d+\.\d+) - Group 1 (just the value findall will return) matching 1+ digits, . and 1+ digits.

The regex.escape(name) will escape chars like (, ) etc. that might appear in the name.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer. I have never dealt with PyPi regex module so it's something to dive into.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.