0

I have DataFrame in Python Pandas like below ("col1" is as string data type):

col1
-------
ACC_P:Indiv|ACC_K:3886|GROUP:gr1|COK:P1K
ACC_P:Group|ACC_K:42|GROUP:gr2|COK:P1C
ACC_P:Indiv|ACC_K:455|GROUP:gpk22|COK:AD2
...

And I need to create new column "col2" where will be only value between "GROUP:" and "|" from values in "col1", so as a result I need something like below:

col1                                        | col2
--------------------------------------------|-------
ACC_P:Indiv|ACC_K:3886|GROUP:gr1|COK:P1K    | gr1
ACC_P:Group|ACC_K:42|GROUP:gr2|COK:P1C      | gr2
ACC_P:Indiv|ACC_K:455|GROUP:gpk22|COK:AD2   | gpk22
...                                         | ...

How can I do that in Python Pandas ?

2 Answers 2

1

You could use regular expressions to extract it. Something like capture everything after GROUP: up until the |

df['col2'] = df.col1.str.extract('GROUP:(.*)\|')

Another less elegant, but perhaps more intuitive option if you aren't familiar w/regex.

Split on the pipe, take the second from last item, split on : and take the last element from that. This of course works only if the structure of your col1 values always follows the same format.

df.col1.str.split('|').str[-2].str.split(':').str[-1]

Output

                                        col1   col2
0   ACC_P:Indiv|ACC_K:3886|GROUP:gr1|COK:P1K    gr1
1     ACC_P:Group|ACC_K:42|GROUP:gr2|COK:P1C    gr2
2  ACC_P:Indiv|ACC_K:455|GROUP:gpk22|COK:AD2  gpk22
Sign up to request clarification or add additional context in comments.

Comments

0

The more elegant way to do it would be through regex but here is my two cents:

d.col1.str.split('GROUP:').apply(lambda x: x[1].split('|')[0])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.