3

So I have a pandas dataframe from csv file looks like this:

year,month,day,list
2017,09,01,"[('United States of America', 12345), (u'Germany', 54321), (u'Switzerland', 13524), (u'Netherlands', 24135), ... ]
2017,09,02,"[('United States of America', 6789), (u'Germany', 9876), (u'Switzerland', 6879), (u'Netherlands', 7968), ... ]

The number of country-count pairs in the 4th column of each row is not identical.
I want to expand the list in the 4th column, and transform the dataframe into something like this:

year,month,day,country,count
2017,09,01,'United States of America',12345
2017,09,01,'Germany',54321
2017,09,01,'Switzerland',13524
2017,09,01,'Netherlands',24135
...
2017,09,02,'United States of America',6789
2017,09,02,'Germany',9876
2017,09,02,'Switzerland',6879
2017,09,02,'Netherlands',7968
...

My thought was to generate 2 independent columns, then join them to the origin dataframe. Maybe Something like this:

country = df.apply(lambda x:[x['list'][0]]).stack().reset_index(level=1, drop=True)
count  = df.apply(lambda x:[x['list'][1]]).stack().reset_index(level=1, drop=True)
df.drop('list', axis=1).join(country).join(count)

The code above is definitely not working (I just hope it can help express my thought), and I have no idea how to expand the date columns as well.
Any help or suggestion is much appreciated.

3 Answers 3

0

Probably, the easiest way to solve you problem is to iterate over the tuples contained in the dataframe and to create a new one. You can do that with two nested for loops.

df_new = []
for i in df.itertuples():
    for l in i.list:
        df_new.append([i.year, i.month, i.day, l[0], l[1]])

df_new = pd.DataFrame(df_new, columns=['year', 'month', 'day', 'country', 'count'])

If the fourth field of the list is not a list but a string (the double quotes in the dataframe example leave me some doubts), you can use the literal_eval function from the ast library: Converting a string representation of a list into an actual list object

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! I'll try this way and see if it works.
You're right - the 4th column was not an actual list but a string, and your method sure solved the problem with date. Thank you!
0

Use:

import ast
#convert strings to lists of tuples
df['list'] = df['list'].apply(ast.literal_eval)
#create reshaped df from column list
df1 =pd.DataFrame([dict(x) for x in df['list'].values.tolist()]).stack().reset_index(level=1)
df1.columns = ['country','count']
#join to original
df = df.drop('list', 1).join(df1).reset_index(drop=True)
print (df)
   year  month  day                   country  count
0  2017      9    1                   Germany  54321
1  2017      9    1               Netherlands  24135
2  2017      9    1               Switzerland  13524
3  2017      9    1  United States of America  12345
4  2017      9    2                   Germany   9876
5  2017      9    2               Netherlands   7968
6  2017      9    2               Switzerland   6879
7  2017      9    2  United States of America   6789

2 Comments

Thank you! I tried and this is exactly what I need.
BTW, I found something's wrong with the date, the problem is probably with the rejoin part. I'll update if I find out how to correct it.
0

So, what you need is to cconvert a column with a list of values into multiple rows. One solution is to create a new dataframe and do a left join:

df = pd.DataFrame({'A':['a','b'],'B':['x','y'],
                   'C':[['a1', 'a2'],['b1', 'b2', 'b3']]})

df
#    A  B               C
# 0  a  x      [[a1, a2]]
# 1  b  y  [[b1, b2, b3]]

dfr=df['C'].apply(lambda k: pd.Series(k)).stack().reset_index(level=1, drop=True).to_frame('C')

dfr
#     C
# 0  a1
# 0  a2
# 1  b1
# 1  b2
# 1  b3

df[['A','B']].join(dfr, how='left')
#    A  B   C
# 0  a  x  a1
# 0  a  x  a2
# 1  b  y  b1
# 1  b  y  b2
# 1  b  y  b3

Finally, use reset_index()

df[['A','B']].join(dfr, how='left').reset_index(drop=1)
#    A  B   C
# 0  a  x  a1
# 1  a  x  a2
# 2  b  y  b1
# 3  b  y  b2
# 4  b  y  b3

Credit: https://stackoverflow.com/a/39955283/2314737

1 Comment

Thank you! I'll try this way too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.