1

My code scrapes information from the website and puts it into a dataframe. But i'm not certain why the order of the code will give rise to the error: AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Basically, the data scraped has over 20 rows and 10 columns.

  • Some values are within brackets ie: (2,333) and I want to change it to: -2333.
  • Some values have words n.a and I want to change it to numpy.nan
  • some values are - and I want to change them to numpy.nan too.

Doesn't Work

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This produces the error - AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

Works

for final_df, engine_name in zip((df_foo, df_bar, df_far), (['engine_foo', 'engine_bar', 'engine_far'])):

# Replacing necessary items for final clean up

    for i in final_df.columns:
        final_df[i] = final_df[i].str.replace(')', '')
        final_df[i] = final_df[i].str.replace(',', '')
        final_df[i] = final_df[i].str.replace('(', '-')

    final_df.replace('-', numpy.nan, inplace=True)
    final_df.replace('n.a.', numpy.nan, inplace=True)

    # Appending Code to dataframe
    final_df = final_df.T
    final_df.insert(loc=0, column='Code', value=some_code)

# This doesn't give me any errors and returns me what I want. 

Any thoughts on why this happens?

1
  • this is not reproducible with any dataframe, could you give a data example ? Commented Jul 9, 2017 at 9:20

1 Answer 1

2

For me works double replace - first with regex=True for replace substrings and second for all values:

np.random.seed(23)
df = pd.DataFrame(np.random.choice(['(2,333)','n.a.','-',2.34], size=(3,3)), 
                  columns=list('ABC'))
print (df)
      A     B        C
0  2.34     -  (2,333)
1  n.a.     -  (2,333)
2  2.34  n.a.  (2,333)

df1 = df.replace(['\(','\)','\,'], ['-','',''], regex=True).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

df1 = df.replace(['-','n.a.'], np.nan).replace(['\(','\)','\,'], ['-','',''], regex=True)
print(df1)  
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333

EDIT:

Your error means you want replace some non string column (e.g. all columns are NaNs in column B) by str.replace:

df1 = df.apply(lambda x: x.str.replace('\(','-').str.replace('\)','')
                           .str.replace(',','')).replace(['-','n.a.'], np.nan)
print(df1)
      A   B      C
0  2.34 NaN  -2333
1   NaN NaN  -2333
2  2.34 NaN  -2333 

df1 = df.replace(['-','n.a.'], np.nan)
       .apply(lambda x: x.str.replace('\(','-')
                         .str.replace('\)','')
                         .str.replace(',',''))
print(df1)

AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index B')

dtype of column B is float64:

df1 = df.replace(['-','n.a.'], np.nan)
print(df1)
      A   B        C
0  2.34 NaN  (2,333)
1   NaN NaN  (2,333)
2  2.34 NaN  (2,333)

print (df1.dtypes)
A     object
B    float64
C     object
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.