I am trying to match exact words with regex but it's not working as I expect it to be. Here's a small example code and data on which I'm trying this. I am trying to match c and java words in a string if found then return true.
I am using this regex \\bc\\b|\\bjava\\b but this is also matching c# which is not what I'm looking for. It should only match that exact word. How can I achieve this?
def match(x):
if re.match('\\bc\\b|\\bjava\\b', x) is not None:
return True
else: return False
print(df)
0 c++ c
1 c# silverlight data-binding
2 c# silverlight data-binding columns
3 jsp jstl
4 java jdbc
Name: tags, dtype: object
df.tags.apply(match)
0 True
1 True
2 True
3 False
4 True
Name: tags, dtype: bool
Expected Output:
0 True
1 False
2 False
3 False
4 True
Name: tags, dtype: bool
\b"matches empty string at word boundary (between \w and \W)" and since # is not \w \bc\b matches c#/\bconsiders alphanumeric characters to be word characters. Since#is not alphanumeric, it creates a word boundary, which is whyc#matches\bc\b.\sc\s|\sjava\sright? I've tried that but it's returning everything asFalse. If this is not what you meant can you post it as an answer below?\srequires a white space character, so it won't work at the start or the end of the string. So you would need to make those matches optional at the start or end of the string.