creating a DataFrame in pandas using a List of lists

Question

I have a list of lists like this.

 sports =  [['Sport', 'Country(s)'], ['Foot_ball', 'brazil'], ['Volleyball', 'Argentina', 'India'], ['Rugger', 'New_zealand', ‘South_africa’], ['Cricket', 'India'], ['Carrom', 'Uk', ‘Usa’], ['Chess', 'Uk']]

I want to create panda data frame using the above lists as follows:

sport      Country(s)
Foot_ball      brazil
Volleyball     Argentina   
Volleyball     india
Rugger         New_zealnd  
Rugger         South_africa
Criket         India
Carrom         UK
Carrom         Usa
Chess          UK

I was trying like this

sport_x = []
for x in sports[1:]:
    sport_x.append(x[0])
print(sport_x)

country = []
for y in sports[1:]:
    country.append(y[1:])

header = sports[0]

df = pd.DataFrame([sport_x,country], columns = header)

halfway through, i m getting this error But i was getting this error.

AssertionError: 2 columns passed, passed data had 6 columns

Any suggestions, how to do this.

AKX · Accepted Answer · 2019-12-22 17:07:14Z

Something like this to first "expand" the irregularly shaped rows, then dataframefy them.

>>> sports = [
        ["Sport", "Country(s)"],
        ["Foot_ball", "brazil"],
        ["Volleyball", "Argentina", "India"],
        ["Rugger", "New_zealand", "South_africa"],
        ["Cricket", "India"],
        ["Carrom", "Uk", "Usa"],
        ["Chess", "Uk"],
    ]
>>> expanded_sports = []
>>> for row in sports:
...   for country in row[1:]:
...     expanded_sports.append((row[0], country))
...
>>> pd.DataFrame(expanded_sports[1:], columns=expanded_sports[0])
        Sport    Country(s)
0   Foot_ball        brazil
1  Volleyball     Argentina
2  Volleyball         India
3      Rugger   New_zealand
4      Rugger  South_africa
5     Cricket         India
6      Carrom            Uk
7      Carrom           Usa
8       Chess            Uk
>>>

EDIT: Another solution using .melt(), but this looks uglier to me, and the order isn't the same.

>>> pd.DataFrame(sports[1:]).melt(0, value_name='country').dropna().drop('variable', axis=1).rename({0: 'sport'}, axis=1)
         sport       country
0    Foot_ball        brazil
1   Volleyball     Argentina
2       Rugger   New_zealand
3      Cricket         India
4       Carrom            Uk
5        Chess            Uk
7   Volleyball         India
8       Rugger  South_africa
10      Carrom           Usa

anky · Accepted Answer · 2019-12-22 17:02:15Z

1

Or, pandas way using explode and list comprehension:

df=pd.DataFrame([[i[0],','.join(i[1:])] if len(i)>2 else i for i in sports[1:]],
         columns=sports[0])
df['Country(s)']=df['Country(s)'].str.split(',')
final=df.explode('Country(s)').reset_index(drop=True)

        Sport    Country(s)
0   Foot_ball        brazil
1  Volleyball     Argentina
2  Volleyball         India
3      Rugger   New_zealand
4      Rugger  South_africa
5     Cricket         India
6      Carrom            Uk
7      Carrom           Usa
8       Chess            Uk

answered Dec 22, 2019 at 17:02

anky

75.3k11 gold badges46 silver badges76 bronze badges

2 Comments

AKX Over a year ago

(So long as none of the values contain a comma!)

AKX Over a year ago

I mean that if one of the country values happened to contain a comma, it would be exploded into multiple rows since it's used as the join/split character. A newline, a tab, ... might be better.

Collectives™ on Stack Overflow

creating a DataFrame in pandas using a List of lists

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related