1

I'm currently struggling with regex. I'm trying to substitute every website ending with a ".com" except one, that is "crypto.com" as it's not a website per se but also the name of a cryptocurrency.

Let's take this sentence:

"Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"

Inspired by this answer, this is my Python regex:

r"(\w+\.)?crypto\.com"

The problem, using https://regex101.com to test it out, is that it's capturing only the crpyto.com, but not the others (which is what I want to do).

Can anyone tell me how to proceed? Thank you!

Expected code:

text = "Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"    
text = re.sub(r"(\w+\.)?crypto\.com", '', text )

Expected output:

"Here are my favorite things: crypto.com,, and "

5
  • 1
    Please add the exact output you expect here. Commented Dec 1, 2021 at 12:59
  • 1
    Try re.sub(r'\s*\b(?!crypto\.)\w+\.com\b', '', text) Commented Dec 1, 2021 at 13:00
  • @TimBiegeleisen Sorry, I added the desired output. Commented Dec 1, 2021 at 13:03
  • @WiktorStribiżew Using your regex, the only thing that's being replaced is the polo.com Commented Dec 1, 2021 at 13:05
  • Why? See regex101.com/r/BWtX6c/2. I tried to make it a bit more comprehensive, see regex101.com/r/BWtX6c/1. You may add rstrip(',') in Python code. Commented Dec 1, 2021 at 13:12

2 Answers 2

1

You can use

\s*\b(?!crypto\.)\w+\.com\b

See the regex demo. Details:

  • \s* - zero or more whitespaces
  • \b - a word boundary
  • (?!crypto\.) - a negative lookahead that fails the match if there is crypto. string immediately to the right of the current location
  • \w+ - one or more word chars
  • \.com - .com
  • \b - a word boundary.

See the Python demo:

import re
text = "Here are my favorite things: crypto.com, polo.com, cryp.com and google.com"
print( re.sub(r'\s*\b(?!crypto\.)\w+\.com\b', '', text) )
# => Here are my favorite things: crypto.com,, and

A more comprehensive regex can also be used to remove commas and the word and:

(?:\s*(?:,|and\s*)?)\b(?!crypto\.)\w+\.com,?

See this regex demo.

Sign up to request clarification or add additional context in comments.

2 Comments

I copy/paste yours and for an unknown reason (maybe there was a typo I didn't spot...) it didn't work. However, your demo works perfectly! Thank you very much for your help!
@Jauhnax How strange, I also copied the code from my comment and pasted into ideone.com. :)
0

Use a negative look-around:

(\w+)?(?<!crypto)\.com

Edit: The question changed slightly I removed a \. that was incorrect, now it should work!

3 Comments

Your regex will fail to match crypto.com in decrypto.com.
As I understand, OP would not want to match that. Or is decrypto.com another cryptocurrency? In this case, they do want that.
crypto and decrypto are different words. The requirement is to only fail one word, crypto.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.