0

I have a code that looks like this

import re

s = "farmer’s boy of s...=--ixpence."
b = "farmer's boy of s...=--ixpence."
s_replaced = re.sub("[^a-zA-Z' ]+", '', s)
b_replaced = re.sub("[^a-zA-Z' ]+", '', b)
print(s_replaced)
print(b_replaced)

>>> farmers boy of sixpence
>>> farmer's boy of sixpence

I was trying to write a code that eliminates all punctuation except for apostrophe, and I don't understand why regex is returning different results for a same set of string. Why is this happening?

5
  • 2
    Those strings are not the same. Commented Nov 30, 2017 at 0:59
  • 3
    Look carefully at those strings. is not the same as '. Commented Nov 30, 2017 at 0:59
  • >>> "farmer’s boy of s...=--ixpence." == "farmer's boy of s...=--ixpence." False Commented Nov 30, 2017 at 1:00
  • oh wow this is stupid... I was struggling over this for two hours.... it wasn't apparent on my PyCharm font! Thanks you tho Commented Nov 30, 2017 at 1:02
  • 1
    @EricKim You should probably do a check before hand of s and b, something like if s == b:, to ensure both strings are the same. Commented Nov 30, 2017 at 1:04

3 Answers 3

2

The strings are not the same.

s contains a while b contains a '. [^a-zA-Z' ] matches anything that is not a-z, A-Z, ', or (a space). This matches , which is in s.

Sign up to request clarification or add additional context in comments.

Comments

1

As others have said, s and b are not the same, since they both contain different apostrophes, and '. This can easily be checked:

>>> s = "farmer’s boy of s...=--ixpence."
>>> b = "farmer's boy of s...=--ixpence."
>>> s == b
False
>>> print([x for x in s if x not in b])
['’']

Which shows that s contains an '’' apostrophe, but b does not. To ensure that only equal strings are compared, you need to do a preliminary == check beforehand:

s = "farmer’s boy of s...=--ixpence."
b = "farmer's boy of s...=--ixpence."

if s == b:
   print("Both strings are equal")
   # Rest of code here

Which checks if the values of s and b are the same before doing anything else.

Comments

0

The first string is a right single quotation mark, the second is an apostrophe. You can check the characters value here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.