I am new to python pandas. I have one dataframe like below:
df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
'age': ['25', '22','21','32','37','26','24','30']})
print df
Name age
0 football 25
1 ramesh 22
2 suresh 21
3 pankaj 32
4 cricket 37
5 rakesh 26
6 mohit 24
7 mahesh 30
"Name" column contains "sports name" and "sport person name" also. I want to split it into two different columns like below:
Expected Output:
sports_name sport_person_name age
football ramesh 25
suresh 22
pankaj 32
cricket rakesh 26
mohit 24
mahesh 30
If I make groupby on "Name" column I'm not getting expected output and it is obviously straight-forward output because no duplicates in "Name" column. What I need to use so that I can get expected output?
Edit : If don't want to hardcode the sports names
df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
'age': ['', '22','21','32','','26','24','30']})
df = df.replace('', np.nan, regex=True)
nan_rows = df[df.isnull().T.any().T]
sports = nan_rows['Name'].tolist()
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)
I Just Checked for except "Name" column which rows contains NAN values in all rest of the columns and It will be definitely sports names. I created list of that sports names and make use of below solutions to create sports_name and sports_person_name columns.