2

I'm working with some data where I'm trying to convert an entire column to a different format (ie from object to datetime or from object to numeric) using methods not resetting values. Each line of code below returns the 'SettingwithCopyWarning' error:

#converting euro values column 'value' to numeric values:

df['value'] = pd.to_numeric(df.value, errors='coerce')

#converting object to datetime in order to extract year:

df['date'] = pd.to_datetime(df['date'])

df['date'] = df['date'].dt.year

If I leave any of the above lines in, it causes an error. If I take all of them out, the code doesn't raise any warnings.

After some research, I learned the 'SettingwithCopyWarning' crops up when chained assignments are used, and the view is a copy of the dataframe as opposed to the dataframe itself, (ref: https://www.dataquest.io/blog/settingwithcopywarning/).

I also learned that the general form to avoid chained assignments is df.loc[<mask or index label values>, <optional column>] = < new scalar value or array like> (ref:python pandas: how to avoid chained assignment).

I tried to wrangle something together like this just to test out the form:

df.loc[df['value']] = pd.to_numeric(df.value, errors='coerce')

but it returns an error like:

KeyError: "['$3.40m' '$3.90m' '$12.60m' '$13.80m' '$123.80m' '$171.20m'\n '$205.2m' '$214.40m' '$221.03m'] not in index"

which is making me think the general form I tried to stuff it in is confusing it for a dictionary and raising a KeyError.

After looking around, I'm not sure how to apply this to entire columns (like my code) that are using methods (dot functions) without using chained assignments.

Is there a way around this?

Edit:

Lines above the given code:

parent_df = pd.DataFrame.from_records(data, columns = ['date', value'])

df = parent_df[parent_df.date.str.contains('.*201[4-9]')]
4
  • 3
    The root of the warning isn't those lines, but some line above that where you created some slice and it didn't return a new object. Did you do something like df = df.drop_duplicates() or df = df[some_boolean_mask]. The TLDR is just do df = df.copy() above those lines (or after the slice that doesn't return a new object): stackoverflow.com/questions/20625582/… Commented Aug 28, 2019 at 21:23
  • Just added this edit: "If I leave any of the above lines in, it causes an error. If I take all of them out, the code doesn't raise any warnings." I also don't think I have anything above it that should cause a problem, the first line is creating a df from the output and the second is creating another df as a subset from the first. Would these raise the warning? I added these lines in an edit at the end. Commented Aug 28, 2019 at 21:38
  • 2
    Yes, the line that is creating the subset is returning a view of the original DataFrame, not a new object. In later lines when you then try to define new columns, you're adding new columns to this view, so you get the warning there. Tag on a .copy() to the end of the operation that creates the subset. This forces pandas create a new object, and the warnings will disappear. Commented Aug 28, 2019 at 21:43
  • I see, thank you for pointing that out. For future understanding, the line that was originally meant to make a copy doesn't have any chaining and I've seen this structure in tutorials for panda before. What is it about the the syntax that is returning a view, not a copy? Commented Aug 28, 2019 at 22:52

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.