I have a data frame similar to the one listed below. For some reason, each team is listed twice, one listing corresponding to each column.
import pandas as pd
import numpy as np
d = {'Team': ['1', '2', '3', '1', '2', '3'], 'Points for': [5, 10, 15, np.nan,np.nan,np.nan], 'Points against' : [np.nan,np.nan,np.nan, 3, 6, 9]}
df = pd.DataFrame(data=d)
Team Points for Points against
0 1 5 Nan
1 2 10 Nan
2 3 15 Nan
3 1 Nan 3
4 2 Nan 6
5 3 Nan 9
How can I just combine rows of duplicate team names so that there are no missing values? This is what I would like:
Team Points for Points against
0 1 5 3
1 2 10 6
2 3 15 9
I have been trying to figure it out with pandas, but can't seem to get it. Thanks!
d = {'Team': ['1', '2', '3'], 'Points for': [5, 10, 15], 'Points against' : [3, 6, 9]}. Or are you saying the data comes to you in this dirty format and you want help cleaning it? Ideally you'd fix whatever code produces this dirty data.