3

This might be a duplicate, but I'm trying to replace all but a certain string pattern. Here is a sample of strings:

'dkas;6-17'
'dsajdl 10'
'dsjalkdj16-20'

The goal here is to replace anything that is not numbers-numbers with nothing. So what I'd get from the strings above are:

'6-17'
''
'16-20'

The second string would yield nothing because it didn't match the pattern. I know the regular expression to match my pattern, but I'm confused about how I'd use regexp_replace to match all but that pattern. The following is what I have, but this replaces the pattern I want to actually keep.

re.sub('[0-9]{1,2}\-[0-9]{1,2}', '', text)

3 Answers 3

2

If you mean by the second would yield nothing, you could match any char except a digit or newline, followed by capturing the pattern in a group.

If the sub should leave an empty string, you could match the whole line using an alternation.

[^\d\r\n]+(\d{1,2}-\d{1,2})|.+

In parts

  • [^\d\r\n]+ Match 1+ times any char except a digit or a newline
  • (\d{1,2}-\d{1,2}) Capture group 1, match 1-2 digits, - and 1-2 digits
  • | Or
  • .+ Match any char except a newline 1+ more times

Regex demo | Python demo

Example code

import re

lines = [
    'dkas;6-17',
    'dsajdl 10',
    'dsjalkdj16-20'
]

for text in lines:
    print(re.sub('[^\d\r\n]+(\d{1,2}-\d{1,2})|.+', r'\1', text))

Output

6-17

16-20
Sign up to request clarification or add additional context in comments.

2 Comments

I guess what I'm hung up on is how to negate the pattern.
@ben890 Wat do you mean by negate the pattern?
0

How about just looking for all the matches in the string and concatenating them together?

>>> ''.join(re.findall('[0-9]{1,2}\-[0-9]{1,2}', 'dkas;6-17abc19-10'))
'6-1719-10'

>>> ''.join(re.findall('[0-9]{1,2}\-[0-9]{1,2}', 'dsajdl 10'))
''

Comments

0

Consider matching

\d+-\d+|$

Demo

If the string were

dkas;6-17

the first match would be 6-17, the second would be the empty string at the end of the line.

If the string were

dsjalkdj16-20kl21-33mn

there would be three matches, 16-20, 21-33 and the empty space at the end of the line.

If the string were

dsajdl 10

the first (and only) match would be the empty string at the end of the line.

It follows that if there it one match it will be the empty string at the end of the string, which is to be returned; else, return the first, or all but the last, match(es), depending on requirements.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.