1
import regex
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = regex.findall(r"/((http[s]?:\/\/)?(www\.)?(gamivo\.com\S*){1})", frase) 
print(x)

Result:

[('www.gamivo.com/product/sea-of-thieves-pc-xbox-one', '', 'www.', 'gamivo.com/product/sea-of-thieves-pc-xbox-one'), ('www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr', '', 'www.', 'gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

I want something like:

[('https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://gamivo.com/product/fifa-21-origin-eng-pl-cz-tr')]

How can I do this?

3
  • Remove the first / and use non-capturing groups. r'(?:https?://)?(?:www\.)?gamivo\.com\S*', see this demo. Commented Jul 23, 2021 at 9:16
  • do u really need regex for this ? split on spaces and take the ones with https in the resulting array Commented Jul 23, 2021 at 9:17
  • @leoOrion yes it's for a more bigger project that needs a regex. So in final project I will replace with str.replace() to use a shorted link Commented Jul 23, 2021 at 9:22

2 Answers 2

1

You need to

  1. Remove the initial / char that invalidates the match of https:// / http:// since / appears after http
  2. Remove unnecessary capturing group and {1} quantifier
  3. Convert the optional capturing group into a non-capturing one.

See this Python demo:

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
print( re.findall(r"(?:https?://)?(?:www\.)?gamivo\.com\S*", frase) )
# => ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

See the regex demo, too. Also, see the related re.findall behaves weird post.

Sign up to request clarification or add additional context in comments.

Comments

0

Try this, it will take string starting from https to single space or newline.

import re
frase = "text https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one other text https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr"
x = re.findall('(https?://(?:[^\s]*))', frase)
print(x)
# ['https://www.gamivo.com/product/sea-of-thieves-pc-xbox-one', 'https://www.gamivo.com/product/fifa-21-origin-eng-pl-cz-tr']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.