0

I currently have a dataframe which contains several columns like this below:

print(df.WIN_COUNTRY_CODE[180:200])

           WIN_COUNTRY_CODE
180                        IT
181                        IT
182                        ES
183    DE---UK---UK---UK---UK
184         UK---UK---UK---UK
185         DE---UK---UK---UK
186    UK---UK---DE---UK---UK
187                        SI
188                        UK
189                        FR

Each cells of the column contain country codes, which can be more than one for each record. Since I would like to convert the country code from 2-letter into 3-letter iso code and also calculate the appearance frequency for this country, i apply this code:

1. I split the string by the 3-dash that separates the countrycodes to convert from string to list:

df['WIN_COUNTRY_CODE_2'] = df['WIN_COUNTRY_CODE'].str.split("---")

This results in the column to be like this:

print(df.WIN_COUNTRY_CODE[180:200])

           WIN_COUNTRY_CODE
180                            ['IT']
181                            ['IT']
182                            ['ES']
183    ['DE', 'UK', 'UK', 'UK', 'UK']
184          ['UK', 'UK', 'UK', 'UK']
185          ['DE', 'UK', 'UK', 'UK']
186    ['UK', 'UK', 'DE', 'UK', 'UK']
187                            ['SI']
188                            ['UK']
189                            ['FR']

2. I apply the mapping method to convert from 2-letter to 3-letter country codes from conversion table that (cattable) and make it a dictionary type (catdict)

catdict= dict([(iso2,iso3) for iso2,iso3 in zip(cattable['iso_2_codes'], cattable['iso_3_codes'])])
df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE_2])

However whenever I apply the mapping it always return me this statement:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-df7aad8ca868> in <module>
      1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
      2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])

<ipython-input-13-df7aad8ca868> in <listcomp>(.0)
      1 cattable = pd.ExcelFile('D:/ROBERT LIBRARIES/Documents/ISD - LKPP Project/vardesc2.xlsx').parse('WIN_COUNTRY_CODE')
      2 catdict= dict([(catnum,catdesc) for catnum,catdesc in zip(cattable['WIN_COUNTRY_CODE'], cattable['Description'])])
----> 3 df.assign(mapped=[[catdict[k] for k in row if catdict.get(k)] for row in df.WIN_COUNTRY_CODE])

TypeError: 'float' object is not iterable


It seems likely that the code returns an error as the entries in the WIN_COUNTRY_CODE column are still in a string format, instead of a list of strings. This I learn after inspecting the objects within the list by this code:

df.WIN_COUNTRY_CODE_2[183][0]

it always return one character instead of the 2-letter code as a string-object.

'['

whereas I expect the code to return a 'DE' object.

Question:

How to convert the WIN_COUNTRY_CODE column from a column of list into a column of list? And how can I find the most frequent country in the entire column? Thank you.

4
  • 1
    "from a column of list into a column of list" are you sure this is what you meant to write? :) Commented Jan 3, 2020 at 13:31
  • "df.WIN_COUNTRY_CODE[183][0]" shouldn't you be looking at "df.WIN_COUNTRY_CODE_2[183][0]", as that's what you named your new column? Commented Jan 3, 2020 at 13:42
  • @ignoring_gravity thanks for the correction. Do you have any suggestion for the term besides "column of string into column of lists"? Commented Jan 4, 2020 at 21:58
  • I couldn't reproduce your error. I copied your dataframe, ran df['WIN_COUNTRY_CODE_2'] = df['WIN_COUNTRY_CODE'].str.split("---"), then ran df.WIN_COUNTRY_CODE_2[183][0] and got 'DE'. What version of pandas are you using? Commented Jan 5, 2020 at 8:15

2 Answers 2

1
df1=df.copy()
df1["WIN_COUNTRY_CODE"]=df['WIN_COUNTRY_CODE'].str.split('---')
df1["Max_code"]=df1["WIN_COUNTRY_CODE"].apply(lambda x: max(set(x), key = x.count))

output

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

this still returns "float object is not iterable"
hey, turns out I have to put .dropna() function to the dataframe because there are missing values. It works now, thanks.
So can you mark the solution with useful upvote too.
0

This might help.

df['new_WIN_COUNTRY_CODE']=df['WIN_COUNTRY_CODE'].map(lambda x: x.split("---") if "---" in x else [x])

print(df)

1 Comment

the function didn't work because the map function requires the iterable parameter. Which is one is the iter in this case? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.