Pandas DataFrame: Converting Column of String into Column of Lists

Question

I currently have a dataframe which contains several columns like this below:

print(df.WIN_COUNTRY_CODE[180:200])

           WIN_COUNTRY_CODE
180                        IT
181                        IT
182                        ES
183    DE---UK---UK---UK---UK
184         UK---UK---UK---UK
185         DE---UK---UK---UK
186    UK---UK---DE---UK---UK
187                        SI
188                        UK
189                        FR

Each cells of the column contain country codes, which can be more than one for each record. Since I would like to convert the country code from 2-letter into 3-letter iso code and also calculate the appearance frequency for this country, i apply this code:

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

df['WIN_COUNTRY_CODE_2'] = df['WIN_COUNTRY_CODE'].str.split("---")

This results in the column to be like this:

print(df.WIN_COUNTRY_CODE[180:200])

           WIN_COUNTRY_CODE
180                            ['IT']
181                            ['IT']
182                            ['ES']
183    ['DE', 'UK', 'UK', 'UK', 'UK']
184          ['UK', 'UK', 'UK', 'UK']
185          ['DE', 'UK', 'UK', 'UK']
186    ['UK', 'UK', 'DE', 'UK', 'UK']
187                            ['SI']
188                            ['UK']
189                            ['FR']

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

catdict= dict([(iso2,iso3) for iso2,iso3 in zip(cattable['iso_2_codes'], cattable['iso_3_codes'])])
df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE_2])

However whenever I apply the mapping it always return me this statement:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-df7aad8ca868> in <module>
      1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
      2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])

<ipython-input-13-df7aad8ca868> in <listcomp>(.0)
      1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
      2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])

TypeError: 'float' object is not iterable

It seems likely that the code returns an error as the entries in the WIN_COUNTRY_CODE column are still in a string format, instead of a list of strings. This I learn after inspecting the objects within the list by this code:

df.WIN_COUNTRY_CODE_2[183][0]

it always return one character instead of the 2-letter code as a string-object.

'['

whereas I expect the code to return a 'DE' object.

Question:

How to convert the WIN_COUNTRY_CODE column from a column of list into a column of list? And how can I find the most frequent country in the entire column? Thank you.

"from a column of list into a column of list" are you sure this is what you meant to write? :) — ignoring_gravity
– ignoring_gravity, Commented Jan 3, 2020 at 13:31
"df.WIN_COUNTRY_CODE[183][0]" shouldn't you be looking at "df.WIN_COUNTRY_CODE_2[183][0]", as that's what you named your new column? — ignoring_gravity
– ignoring_gravity, Commented Jan 3, 2020 at 13:42
@ignoring_gravity thanks for the correction. Do you have any suggestion for the term besides "column of string into column of lists"? — freudslipper
– freudslipper, Commented Jan 4, 2020 at 21:58
I couldn't reproduce your error. I copied your dataframe, ran df['WIN_COUNTRY_CODE_2'] = df['WIN_COUNTRY_CODE'].str.split("---"), then ran df.WIN_COUNTRY_CODE_2[183][0] and got 'DE'. What version of pandas are you using? — ignoring_gravity
– ignoring_gravity, Commented Jan 5, 2020 at 8:15

vrana95 · Accepted Answer · 2020-01-03 13:50:34Z

1

df1=df.copy()
df1["WIN_COUNTRY_CODE"]=df['WIN_COUNTRY_CODE'].str.split('---')
df1["Max_code"]=df1["WIN_COUNTRY_CODE"].apply(lambda x: max(set(x), key = x.count))

output

answered Jan 3, 2020 at 13:50

vrana95

5212 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

freudslipper Over a year ago

this still returns "float object is not iterable"

freudslipper Over a year ago

hey, turns out I have to put .dropna() function to the dataframe because there are missing values. It works now, thanks.

vrana95 Over a year ago

So can you mark the solution with useful upvote too.

Hayat · Accepted Answer · 2020-01-03 13:41:44Z

0

This might help.

df['new_WIN_COUNTRY_CODE']=df['WIN_COUNTRY_CODE'].map(lambda x: x.split("---") if "---" in x else [x])

print(df)

answered Jan 3, 2020 at 13:41

Hayat

1,6494 gold badges22 silver badges32 bronze badges

1 Comment

freudslipper Over a year ago

the function didn't work because the map function requires the iterable parameter. Which is one is the iter in this case? Thanks

Collectives™ on Stack Overflow

Pandas DataFrame: Converting Column of String into Column of Lists

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

Question:

2 Answers 2

output

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

Question:

2 Answers 2

output

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related