str error when replacing values in pandas dataframe

Question

My code scrapes information from the website and puts it into a dataframe. But i'm not certain why the order of the code will give rise to the error: AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Basically, the data scraped has over 20 rows and 10 columns.

Some values are within brackets ie: (2,333) and I want to change it to: -2333.
Some values have words n.a and I want to change it to numpy.nan
some values are - and I want to change them to numpy.nan too.

Doesn't Work

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This produces the error - AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Works

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This doesn't give me any errors and returns me what I want.

Any thoughts on why this happens?

this is not reproducible with any dataframe, could you give a data example ? — PRMoureu
– PRMoureu, Commented Jul 9, 2017 at 9:20

jezrael · Accepted Answer · 2017-07-09 10:29:09Z

For me works double replace - first with regex=True for replace substrings and second for all values:

np.random.seed(23)
df = pd.DataFrame(np.random.choice(['(2,333)','n.a.','-',2.34], size=(3,3)), 
                  columns=list('ABC'))
print (df)
      A     B        C
0  2.34     -  (2,333)
1  n.a.     -  (2,333)
2  2.34  n.a.  (2,333)

df1 = df.replace(['\(','\)','\,'], ['-','',''], regex=True).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

df1 = df.replace(['-','n.a.'], np.nan).replace(['\(','\)','\,'], ['-','',''], regex=True)
print(df1)  
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

EDIT:

Your error means you want replace some non string column (e.g. all columns are NaNs in column B) by str.replace:

df1 = df.apply(lambda x: x.str.replace('\(','-').str.replace('\)','')
                           .str.replace(',','')).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

df1 = df.replace(['-','n.a.'], np.nan)
       .apply(lambda x: x.str.replace('\(','-')
                         .str.replace('\)','')
                         .str.replace(',',''))
print(df1)

AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index B')

dtype of column B is float64:

df1 = df.replace(['-','n.a.'], np.nan)
print(df1)
      A   B        C
0  2.34 NaN  (2,333)
1   NaN NaN  (2,333)
2  2.34 NaN  (2,333)

print (df1.dtypes)
A     object
B    float64
C     object
dtype: object

Collectives™ on Stack Overflow

str error when replacing values in pandas dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related