I have a DataFrame that I took from basketball-reference with player names. The code below is how I built the DataFrame. It has 5 columns of player names, but each name also has the player's position.
url = "http://www.basketball-reference.com/awards/all_league.html"
dframe_list = pd.io.html.read_html(url)
df = dframe_list[0]
df.drop(df.columns[[0,1,2]], inplace=True, axis=1)
column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
df.columns = column_names
df = df[df.name1.notnull()]
I am trying to split off the position. To do so I had planned to make a DataFrame for each name column:
name1 = pd.DataFrame(df.name1.str.split().tolist()).ix[:,0:1]
name1[0] = name1[0] + " " + name1[1]
name1.drop(name1.columns[[1]], inplace=True, axis=1)
Since I have five columns I thought I would do this with a loop
column_names = ['name1', 'name2', 'name3', 'name4', 'name5']
for column in column_names:
column = pd.DataFrame(df.column.str.split().tolist()).ix[:,0:1]
column[0] = column[0] + " " + column[1]
column.drop(column.columns[[1]], inplace=True, axis=1)
column.columns = column
And then I'd join all these DataFrames back together.
df_NBA = [name1, name2, name3, name4, name5]
df_NBA = pd.concat(df_NBA, axis=1)
I'm new to python, so I'm sure I'm doing this in a pretty cumbersome fashion and would love suggestions as to how I might do this faster. But my main question is, when I run the code on individual columns it works fine, but if when I run the loop I get the error:
AttributeError: 'DataFrame' object has no attribute 'column'
It seems that the part of the loop df.column.str is causing some problem? I've fiddled around with the list, with bracketing column (I still don't understand why sometimes I bracket a DataFrame column and sometimes it's .column, but that's a bigger issue) and other random things.
When I try @BrenBarn's suggestion
df.apply(lambda c: c.str[:-2])
The following pops up in the Jupyter notebook:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
if __name__ == '__main__':
Looking at the DataFrame, nothing has actually changed and if I understand the documentation correctly this method creates a copy of the DataFrame with the edits, but that this is a temporary copy that get's thrown out afterward so the actual DataFrame doesn't change.
df[column]?df.columncorresponds todf['column'], notdf[column]. So when column is a variable, you cannot use it like that.df[column],df['column'], I guess that's what @ayhan is saying. So is there an answer?df[column], I do get errors, but they're related to resetting the column names (yourcolumn.columns = column). Also, your data sample has only two columns but your code still tries to iterate over 5, leading to additional errors.