0

I am currently having trouble removing the end of strings using regex. I have tried using .partition with unsuccessful results. I am now trying to use regex unsuccessfully. All the strings follow the format of some random words **X*.* Some more words. Where * is a digit and X is a literal X. For Example 21X2.5. Everything after this dynamic string should be removed. I am trying to use re.sub('\d\d\X\d.\d', string). Can someone point me in the right direction with regex and how to split the string?

The expected output should read: some random words 21X2.5

Thanks!

1
  • what is your expected output? Do you want to replace 21X2.5 with something else? or remove end of strings? Commented Mar 19, 2015 at 3:35

3 Answers 3

2

Use following regex:

re.search("(.*?\d\dX\d\.\d)", "some random words 21X2.5 Some more words").groups()[0]

Output:

'some random words 21X2.5'
Sign up to request clarification or add additional context in comments.

1 Comment

Instead of groups()[0] you could use group(), and the parentheses are not needed in the pattern.
0

Your regex is not correct. The biggest problem is that you need to escape the period. Otherwise, the regex treats the period as a match to any character. To match just that pattern, you can use something like:

re.findall('[\d]{2}X\d\.\d', 'asb12X4.4abc')

[\d]{2} matches a sequence of two integers, X matches the literal X, \d matches a single integer, \. matches the literal ., and \d matches the final integer.

This will match and return only 12X4.4.

It sounds like you instead want to remove everything after the matched expression. To get your desired output, you can do something like:

re.split('(.*?[\d]{2}X\d\.\d)', 'some random words 21X2.5  Some more words')[1]

which will return some random words 21X2.5. This expression pulls everything before and including the matched regex and returns it, discarding the end.

Let me know if this works.

1 Comment

Thanks Jason! worked like a charm - I wasn't sure about the regex.
0

To remove everything after the pattern, i.e do exactly as you say...:

s = re.sub(r'(\d\dX\d\.\d).*', r'\1', s)

Of course, if you mean something else than what you said, something different will be needed! E.g if you want to also remove the pattern itself, not just (as you said) what's after it:

s = re.sub(r'\d\dX\d\.\d.*', r'', s)

and so forth, depending on what, exactly, are your specs!-)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.