0

What is the meaning of string locator ', \s*([^\.]*)\s*\.' =?

I have a dataframe identical to Extract sub-string between 2 special characters from one column of Pandas DataFrame

and want to extract the substring located between "," and ".". Thanks to the post answer, a way would be as below:

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

Although I see the outcome being correct, what is the meaning of ',\s*([^\.]*)\s*\.'? In particular, what is the meaning of '*' and '\'?

2
  • 1
    @JustBaron. The first = symbol was part of the question mark, not the expression =) Commented Sep 8, 2018 at 14:39
  • Possible duplicate of Reference - What does this regex mean? Commented Sep 8, 2018 at 14:49

1 Answer 1

2

It means the following, match:

  • a , (comma)
  • followed by \s* zero or more whitespaces characters (tab, spaces, etc)
  • followed by ([^\.])* zero or more characters that are not a . (dot)
  • followed by \s* zero or more whitespaces characters
  • followed by a \. (dot)

You can find more about regex in here.

UPDATE

As @UnbearableLightness mentioned the character \ is redundant inside a character set to escape the . (dot). A character set is anything defined between [].

Sign up to request clarification or add additional context in comments.

2 Comments

The \ in the character set is escaping the dot, which is of course redundant. A \ will still be matched by the expression.
Thanks a lot for the help! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.