0

Let's say I have a dataframe df looking like this:

|ColA     |
|---------|
|B=7      |
|(no data)|
|C=5      |
|B=3,C=6  |

How do I extract the data into new colums, so it looks like this:

|ColA  | B | C |
|------|---|---|
|True  | 7 |   |
|False |   |   |
|True  |   | 5 |
|True  | 3 | 6 |

For filling the columns I know I can use regex .extract, as shown in this solution.

But how do I set the Column name at the same time? So far I use a loop over df.ColA.loc[df["ColA"].isna()].iteritems(), but that does not seem like the best option for a lot of data.

1 Answer 1

1

You could use str.extractall to get the data, then reshape the output and join to a derivative of the original dataframe:

# create the B/C columns
df2 = (df['ColA'].str.extractall('([^=]+)=([^=,]+),?')
                 .set_index(0, append=True)
                 .droplevel('match')[1]
                 .unstack(0, fill_value='')
       )

# rework ColA and join previous output
df.notnull().join(df2).fillna('')

# or if several columns:
df.assign(ColA=df['ColA'].notnull()).join(df2).fillna('')

output:

    ColA  B  C
0   True  7   
1  False      
2   True     5
3   True  3  6
Sign up to request clarification or add additional context in comments.

2 Comments

Hi, this looks good already. But if I have multiple columns in df then it sets all column values as 'True'. Could you also explain why you use (~df.isnull())? Thanks!
Then you can use df.assign(ColA=~df['ColA'].isnull()).join(…), the ~ is to invert the result of isnull but one can actually use notnull ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.