0

I have a pandas column like this:

LOD-NY-EP-ADM
LOD-NY-EC-RUL
LOD-NY-EC-WFL
LOD-NY-LSM-SER
LOD-NY-PM-MOB
LOD-NY-PM-MOB
LOD-NY-RMK
LOD-NY-EC-TIM

I want the output in new column as

EP
EC
EC
LSM
PM
PM
RMK
EC

I tried this:

pattern=df.column[0:10].str.extract(r"\w*-NY-(.*?)-\w*",expand=False)

While it works for everything but it fails to get RMK out and gives NaN since there is nothing after that and it looks for -\w zero or more times. But then that should work if there is nothing after RMK.

Any idea whats going wrong?

We can just use a array of these and use regular expression if pandas syntax is not familiar.

1

2 Answers 2

1

Could you just use regular python? Let df be your dataframe, and row be the name of your row.

series = df.row
new_list =  [i.split('-')[2] for i in series]
new_series = pd.Series(new_list)
Sign up to request clarification or add additional context in comments.

1 Comment

can we try with regular expression? I can also use slice method and count the index there. But wanted to check if regular expression makes sense.
1
pattern=df.column[0:10].str.extract(r"\w*-NY-(\w+)",expand=False)

See https://regex101.com/r/3uDpam/3

Your regex meant matching strings must have 3 - characters. I changed it so last -XX could occur 0 or 1 times.

UPDATE: Changed so 2nd group is non-capturing (added ?:)

UPDATE: Thanks to Casimir, removed useless group at end of pattern

6 Comments

something is weird. It works if i just try this regex on a string. But when I am trying on pandas column as above, it is giving even the hyphen along with the extracted string. And still fails for RMK. But that RMK thing works fine if I just use it on a string
Maybe Pandas is using the last matching group. Try this \w*-NY-(\w+)(?:-\w+)? (so the 2nd group is non-capturing)
Ok i got the issue. It is also adding a column for third group (extracting the last letters also). I dont want that. I only want the first group to be extracted. How do we do group(0) in pandas extract?
is there any info on this like how do u know by adding ?: makes it work?
Since it is optional, writing (?:-\w+)? at the end of a pattern is useless. (writing something optional at the end of a pattern is always useless, except if you have to capture something inside this optional part).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.