1

I am new to Python and still learning basic concepts of it. I was wondering if there is any easy way to get all the words from a sentence where my search string matches the substrings of any words.

sentence = "Hi this is just an example.com sentence"
search = ".com"

Output :

example.com

I tried to use re module but got stuck at point where I need to fetch the whole word. This is what I have tried which returns the whole sentence as output:

re.findall(f".*{search}*", sentence)

I wanted to know if I am using the right approach or if there is any other way to solve this.

Additional Question :

Using reference of John's answer, [w for w in sentence.split() if '.com' in w] I am able to get the expected output. As an additional requirement, if

sentence = "Hi this is just an Example.COM sentence" 

I still want it to match with .com and return Example.COM

4
  • 4
    re could work but seems like overkill. You could just use [w for w in sentence.split() if '.com' in w] Commented Jan 5, 2023 at 17:35
  • @JohnColeman Thanks for this answer. I was moving in re direction so forgot the basics. Additional question : If the sentence is "Hi this is just an example.COM sentence" I still want it to match with .com and return example.COM Commented Jan 6, 2023 at 10:14
  • @TestLearner That would be an important detail to mention in your question. Commented Jan 6, 2023 at 10:18
  • Updated the question with additional question section Commented Jan 6, 2023 at 10:42

3 Answers 3

1

Please consider what pattern have you created

sentence = "Hi this is just an example.com sentence"
search = ".com"
print(f".*{search}*")

gives output

.*.com*

that does match any character (.) repeated zero or more times (*) followed by any character (.) followed by single character c followed by single character o followed by character m repeated zero or more times.

I would use re following way in this case

import re
sentence = "Hi this is just an example.com or example.COM sentence"
search = ".com"
found = re.findall(f"\\w*{re.escape(search)}\\w*",sentence,re.IGNORECASE)
print(found)

gives output

['example.com', 'example.COM']

Explanation: I use re.escape so . in search is treated as literal dot, not any character, I am looking for search prefixed and suffixed by zero-or-more (*) word characters (\w) and do that in case-insensitive way using re.IGNORECASE flag.

Sign up to request clarification or add additional context in comments.

1 Comment

Good catch! As in Diego's answer, one could avoid doubling up on slashes by using a raw string (rf"").
0

If you want to use regexs (perhaps you are planning for more complex searches)

#! /usr/bin/env python3

import re

sentence = "Hi this is just an example.com sentence, don't match dotcom"
search = "\\.com"

print(re.findall(f"\\b(\\w*{search})\\b", sentence))

this matches the word boundaries (\b), result

['example.com']

1 Comment

Could avoid doubling the slashes with a raw string , r""
0

Using re package:

import re
c = re.compile(f"[A-Za-z]+{search}")
c.findall(sentence)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.