2

I have a long string of data which looks like:

dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd

Notice that the '12345.123' pattern is the same. I want to split the string on it using python (so s.split(<regex>)).

What would be the appropriate regex?

'[0-9]{5}.[0-9]{3}'

does not work; I presume it expects whitespace around it(?).

3 Answers 3

4

Just escape ., and you are done:

\d{5}\.\d{3}

You can use Regex token \d as a shorthand for [0-9].

Example:

>>> re.split(r'\d{5}\.\d{3}', 'dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd')
['dstgfsda', 'gsrsvrvsdfcsd', 'tsrsd']
Sign up to request clarification or add additional context in comments.

Comments

1

I don't understand exactly what's your actual need but seems that you want your regex to isolate each occurrence of 5 digits, dot, 3 digits.

So instead of '[0-9]{5}.[0-9]{3}' you must use '[0-9]{5}\.[0-9]{3}', because . matches any character, while \. matches only a dot.

Comments

1

Your regex should be '\d{5}\.\d{3}'.

Check the usage of . instead of \.. That is because, '.' (Dot.) in the default mode, matches any character except a newline. Refer regex document. Whereas \s means dot in your string.

For example:

import re
my_string = 'dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd'
my_regex = '\d{5}\.\d{3}'
re.split(my_regex, my_string)
# returns: ['dstgfsda', 'gsrsvrvsdfcsd', 'tsrsd']

Explanation on how '\d{5}\.\d{3}' works:

\d means any digit between 0-9. \d{5} sub-string with any 5 consecutive digits. \. means digits followed by single .. At last \d{3} means any 3 digits after .

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.