1

I have a DataFrame that I took from basketball-reference with player names. The code below is how I built the DataFrame. It has 5 columns of player names, but each name also has the player's position.

url = "http://www.basketball-reference.com/awards/all_league.html"
dframe_list = pd.io.html.read_html(url)
df = dframe_list[0]
df.drop(df.columns[[0,1,2]], inplace=True, axis=1)
column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
df.columns = column_names
df = df[df.name1.notnull()]

I am trying to split off the position. To do so I had planned to make a DataFrame for each name column:

name1 = pd.DataFrame(df.name1.str.split().tolist()).ix[:,0:1]
name1[0] = name1[0] + " " + name1[1]
name1.drop(name1.columns[[1]], inplace=True, axis=1)

Since I have five columns I thought I would do this with a loop

column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
for column in column_names:
    column = pd.DataFrame(df.column.str.split().tolist()).ix[:,0:1]
    column[0] = column[0] + " " + column[1]
    column.drop(column.columns[[1]], inplace=True, axis=1)
    column.columns = column

And then I'd join all these DataFrames back together.

df_NBA = [name1, name2, name3, name4, name5]
df_NBA = pd.concat(df_NBA, axis=1)

I'm new to python, so I'm sure I'm doing this in a pretty cumbersome fashion and would love suggestions as to how I might do this faster. But my main question is, when I run the code on individual columns it works fine, but if when I run the loop I get the error:

AttributeError: 'DataFrame' object has no attribute 'column'

It seems that the part of the loop df.column.str is causing some problem? I've fiddled around with the list, with bracketing column (I still don't understand why sometimes I bracket a DataFrame column and sometimes it's .column, but that's a bigger issue) and other random things.

When I try @BrenBarn's suggestion

df.apply(lambda c: c.str[:-2])

The following pops up in the Jupyter notebook:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation:    http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

Looking at the DataFrame, nothing has actually changed and if I understand the documentation correctly this method creates a copy of the DataFrame with the edits, but that this is a temporary copy that get's thrown out afterward so the actual DataFrame doesn't change.

5
  • Are you saying it also doesn't work if you do df[column]? Commented Jul 29, 2016 at 19:23
  • 2
    df.column corresponds to df['column'], not df[column]. So when column is a variable, you cannot use it like that. Commented Jul 29, 2016 at 19:28
  • @BrenBarn, yep, it doesn't work if I use df[column], df['column'], I guess that's what @ayhan is saying. So is there an answer? Commented Jul 29, 2016 at 19:44
  • I can't reproduce your problem. If I run your code with df[column], I do get errors, but they're related to resetting the column names (your column.columns = column). Also, your data sample has only two columns but your code still tries to iterate over 5, leading to additional errors. Commented Jul 29, 2016 at 20:04
  • Edited the question to reflect the actual DataFrame I am working with. Thanks @BrenBarn for the suggestions. Commented Jul 31, 2016 at 7:10

1 Answer 1

2

If the position labels are always only one character, the simple solution is this:

>>> df.apply(lambda c: c.str[:-2])
           name1         name2
0     Marc Gasol  Lebron James
1      Pau Gasol  Kevin Durant
2  Dwight Howard  Kyrie Irving

The str attribute of a Series lets you do string operations, including indexing, so this just trims the last two characters off each value.

As for your question about df.column, this issue is more general than pandas. These two things are not the same:

# works
obj.attr

# doesn't work
attrName = 'attr'
obj.attrName

You can't use the dot notation when you want to access an attribute whose name is stored in a variable. In general, you can use the getattr function instead. However, pandas provides the bracket notation for accessing a column by specifying the name as a string (rather than a source-code identifier). So these two are equivalent:

df.some_column

columnName = "some_column"
df[columnName]

In your example, changing your reference to df.column to df[column] should resolve that issue. However, as I mentioned in a comment, your code has other problems too. As far as solving the task at hand, the string-indexing approach I showed at the beginning of my answer is much simpler.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the solution, unfortunately, when I try this on the actual DataFrame it doesn't actually change anything. In the Jupyter notebook a red box appears and states that 'a value is trying to be set on a copy of a slice from a DataFrame' and says this raises a SettingWithCopy error. The documentation states that this method may return a copy of a temporary view of the DataFrame that get's thrown out afterward so it won't run.
@vino88: Then please edit your question to include a self-contained example demonstrating the problem. (Or ask a separate question, if your new question is really about this new way of doing it and is unrelated to the code you have posted here.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.