0

I have a pandas dataframe as follows.

            df 

                          Scenario    Savings             PC1    PC2
                     0     HI        Total_FFC_base0      0.12    0.13
                     1     HI        Total_FFC_savings1   0.15    0.12
                     2     HI        Total_FFC_savings2   0.12    0.14
                     3     HI        Total_FFC_savings3   0.17    0.15
                     4     HI        Total_site_base0     0.12    0.15
                     5     HI        Total_site_savings1  0.15    0.15

I want to replace df.Savings and created another column df['EL'] by extracting some of the string form column 'Savings, so the df looks like this:

            df 

                          Scenario    Savings    EL         PC1    PC2
                     0     HI          FFC       0         0.12    0.13
                     1     HI          FFC       1         0.15    0.12
                     2     HI          FFC       2         0.12    0.14
                     3     HI          FFC       3         0.17    0.15
                     4     HI          site      0         0.12    0.15
                     5     HI          site      1         0.15    0.15

I used the following code to replace df['Savings].

       df['Saving']=df['Savings'].str.split('_')[1]

However, I got the following error message.

"Can only use .str accessor with string values, which use np.object_ dtype in pandas"

Thank you for your help.

4
  • 2
    it should be df['Savings'].str.split('_').str[1] Commented Feb 26, 2020 at 22:17
  • Thank you! It worked. Commented Feb 26, 2020 at 22:21
  • Please share the entire error message. The formatting of your post is broken, by the way. Commented Feb 27, 2020 at 1:54
  • Does this answer your question? AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas Commented Feb 27, 2020 at 1:55

3 Answers 3

2

You can try with the following:

import pandas as pd
df = pd.DataFrame({'Scenario':['HI','HI','HI','HI','HI','HI'],
                   'Savings':['Total_FFC_base0','Total_FFC_savings1','Total_FFC_saving2',
                              'Total_FFC_savings3','Total_site_base0','Total_site_savings1'],
                    'PC1':[0.12,0.15,0.12,0.17,0.12,0.15],
                    'PC2':[0.13,0.12,0.14,0.15,0.15,0.15]})
df['Saving'] = df['Savings'].str.split('_').apply(lambda x: x[1])
df['EL'] = df['Savings'].str.extract('(\d+)') #To extract only digits
df = df.drop(columns='Savings')
print(df)

Output:

  Scenario Saving   PC1   PC2 EL
0       HI    FFC  0.12  0.13  0
1       HI    FFC  0.15  0.12  1
2       HI    FFC  0.12  0.14  2
3       HI    FFC  0.17  0.15  3
4       HI   site  0.12  0.15  0
5       HI   site  0.15  0.15  1
Sign up to request clarification or add additional context in comments.

Comments

2

Perfect usecase for named groups with regex, we can extract the data and simultaneously name the columns:

df[['Savings', 'EL']] = df['Savings'].str.extract('_(?P<Savings>.*)_.*(?P<EL>\d+)')

  Scenario Savings   PC1   PC2 EL
0       HI     FFC  0.12  0.13  0
1       HI     FFC  0.15  0.12  1
2       HI     FFC  0.12  0.14  2
3       HI     FFC  0.17  0.15  3
4       HI    site  0.12  0.15  0
5       HI    site  0.15  0.15  1

Comments

0

In python split() function which takes a parameter to separate the values here you use ('_') underscore and when you splitting the string you give index[1].that's why it use FFC So if you see on index 1 after underscore FFC is present.and in pandas object is complex dtype.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.