0

I have a list of lists like this.

 sports =  [['Sport', 'Country(s)'], ['Foot_ball', 'brazil'], ['Volleyball', 'Argentina', 'India'], ['Rugger', 'New_zealand', ‘South_africa’], ['Cricket', 'India'], ['Carrom', 'Uk', ‘Usa’], ['Chess', 'Uk']]

I want to create panda data frame using the above lists as follows:

sport      Country(s)
Foot_ball      brazil
Volleyball     Argentina   
Volleyball     india
Rugger         New_zealnd  
Rugger         South_africa
Criket         India
Carrom         UK
Carrom         Usa
Chess          UK

I was trying like this

sport_x = []
for x in sports[1:]:
    sport_x.append(x[0])
print(sport_x)

country = []
for y in sports[1:]:
    country.append(y[1:])

header = sports[0]

df = pd.DataFrame([sport_x,country], columns = header)

halfway through, i m getting this error But i was getting this error.

AssertionError: 2 columns passed, passed data had 6 columns

Any suggestions, how to do this.

2 Answers 2

2

Something like this to first "expand" the irregularly shaped rows, then dataframefy them.

>>> sports = [
        ["Sport", "Country(s)"],
        ["Foot_ball", "brazil"],
        ["Volleyball", "Argentina", "India"],
        ["Rugger", "New_zealand", "South_africa"],
        ["Cricket", "India"],
        ["Carrom", "Uk", "Usa"],
        ["Chess", "Uk"],
    ]
>>> expanded_sports = []
>>> for row in sports:
...   for country in row[1:]:
...     expanded_sports.append((row[0], country))
...
>>> pd.DataFrame(expanded_sports[1:], columns=expanded_sports[0])
        Sport    Country(s)
0   Foot_ball        brazil
1  Volleyball     Argentina
2  Volleyball         India
3      Rugger   New_zealand
4      Rugger  South_africa
5     Cricket         India
6      Carrom            Uk
7      Carrom           Usa
8       Chess            Uk
>>>

EDIT: Another solution using .melt(), but this looks uglier to me, and the order isn't the same.

>>> pd.DataFrame(sports[1:]).melt(0, value_name='country').dropna().drop('variable', axis=1).rename({0: 'sport'}, axis=1)
         sport       country
0    Foot_ball        brazil
1   Volleyball     Argentina
2       Rugger   New_zealand
3      Cricket         India
4       Carrom            Uk
5        Chess            Uk
7   Volleyball         India
8       Rugger  South_africa
10      Carrom           Usa
Sign up to request clarification or add additional context in comments.

Comments

1

Or, pandas way using explode and list comprehension:

df=pd.DataFrame([[i[0],','.join(i[1:])] if len(i)>2 else i for i in sports[1:]],
         columns=sports[0])
df['Country(s)']=df['Country(s)'].str.split(',')
final=df.explode('Country(s)').reset_index(drop=True)

        Sport    Country(s)
0   Foot_ball        brazil
1  Volleyball     Argentina
2  Volleyball         India
3      Rugger   New_zealand
4      Rugger  South_africa
5     Cricket         India
6      Carrom            Uk
7      Carrom           Usa
8       Chess            Uk

2 Comments

(So long as none of the values contain a comma!)
I mean that if one of the country values happened to contain a comma, it would be exploded into multiple rows since it's used as the join/split character. A newline, a tab, ... might be better.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.