1

I am trying to understand to get the repeated words based on matching their last 3 characters or three characters from a string.

var1 = "we have hotel in Singapore and we have motel as well in Singapore, please let us know about your plan of visit and we will tell you more about venue and locations around us."

Expected some like below:

Words having their last 3 character same should be returning like.

(hotel, motel, singapore, about, have)

Trial:

when i'm testing that over regex101.com as (\w[a-zA-Z]tel) it gets me the word hotel and motel, Similarly ..

(\w*[a-zA-Z]ore)  <-- this gives me `Singapore`
(\w*[a-zA-Z]out)  <-- this gives me `about`
(\w*[a-zA-Z]ve)   <-- this gives me  `have`
(we\s)            <-- this gives me  `we`

Now, while i am combining them altogether like (\w[a-zA-Z]tel)(\w*[a-zA-Z]ore)(\w*[a-zA-Z]out)(\w*[a-zA-Z]ve)(we\s) it doesn't gives anything.

I am Just trying hard to get it but not getting a right solution.

EDIT:

As i am giving the last three characters hard coded, Is it possible to achieve this without providing these and evaluate the same.

7
  • 1
    It is not clear about we, you are extracting a known word? Right now, you could as well use re.findall(r'\b(\w*[a-zA-Z](?:ore|out|ve)|we)\b', text) Commented Aug 12, 2021 at 13:00
  • 1
    Are you looking for a separator? It's the character '|'. This wil match one regex or the other: (\w[a-zA-Z]tel)|(\w*[a-zA-Z]ore)|(\w*[a-zA-Z]out)|(\w*[a-zA-Z]ve)|(we\s) Commented Aug 12, 2021 at 13:15
  • What does "As i am giving the last three characters hard coded, Is it possible to achieve this without providing these and evaluate the same." mean? Do you mean you want to extract all words with at least 3 letters? set(re.findall(r'\w*[^\W\d_]{3}\b', text))? Commented Aug 12, 2021 at 15:10
  • Ok, do you want set([x.group() for x in re.finditer(r'\b[a-zA-Z]*([a-zA-Z]{3})\b(?=.*\1\b)', text)])? Commented Aug 12, 2021 at 15:23
  • 1
    It seems like you want to extract any repeating 3+ letter words and remove dupes, see ideone.com/SFz7RQ Commented Aug 12, 2021 at 15:36

1 Answer 1

1

You can use

\b([a-zA-Z]{3,})\b(?=.*\b\1\b)

See the regex demo. Details:

  • \b([a-zA-Z]{3,})\b - a whole word consisting of three or more ASCII letters
  • (?=.*\b\1\b) - that is followed with any zero or more chars other than line break chars as many as possible and the same word as a whole word.

See the Python demo:

import re
var1 = "we have hotel in Singapore and we have motel as well in Singapore, please let us know about your plan of visit and we will tell you more about venue and locations around us."
print(set([x.group() for x in re.finditer(r'\b([a-zA-Z]{3,})\b(?=.*\b\1\b)', var1)]))
# => {'Singapore', 'have', 'about', 'and'}

Here, set(...) will remove any duplicate matches returned by the re.finditer.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot Wiktor, this what i was looking for exactly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.