0

My dataframe looks like this: Name

I want to add the league the club plays in, and the country that league is based in, as new columns, for every row.

I initially tried this using dictionaries with the clubs/countries, and returning the key for a value:

club_country_dict = {'La Liga':['Real Madrid','FC Barcelona'],'France Ligue 1':['Paris Saint-Germain']}

key_list=list(club_country_dict.keys())
val_list=list(club_country_dict.values())

But this ran into issues since each of my keys values is actually a list, rather than a single value.

I then tried some IF THEN logic, by just having standalone variables for each league, and checking if the club value was in each variable:

la_Liga = ['Real Madrid','FC Barcelona']

for row in data:
      if data['Club'] in la_Liga:
            data['League'] = 'La Liga'

Apologies for the messy question. Basically I'm looking to add a two new columns to my dataset, 'League' and 'Country', based on the 'Club' column value. I'm not sure what's the easiest way to do this but I've hit walls trying to different ways. Thanks in advance.

0

4 Answers 4

3

You could convert the dictionary to a data frame and then merge:

df = pd.DataFrame({"Name": ["CR", "Messi", "neymar"], "Club": ["Real Madrid", "FC Barcelona", "Paris Saint-Germain"]})

df.merge(pd.DataFrame(club_country_dict.items(), columns=['League', 'Club']).explode('Club'),
         on = 'Club', how='left')

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

This is the answer I ended up using, as having one key with multiple clubs in list format like so - {'La Liga':['Real Madrid','FC Barcelona'] } made things much easier when I want to add hundreds of clubs without individually having a new key each time. Thanks.
1

Here is one of the simple way to solve it. Use Pandas apply function on rows https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

import pandas as pd

df = pd.DataFrame({"name": ["CR", "Messi", "neymar"], "club": ["RM", "BR", "PSG"]})

country = {"BR": "Spain", "PSG": "France", "RM": "Spain"}

df["country"] = df.apply(lambda row: country[row.club], axis=1)

print(df)

Output:

     name club country
0      CR   RM   Spain
1   Messi   BR   Spain
2  neymar  PSG  France

Comments

1

Try pandas replace feature for Series.

df = pd.DataFrame({"Name" : ['Cristiano', 'L. Messi', "Neymar"], 'Club' : ["Real Madrid", "FC Barcelona", "Paris Saint-Germain"]})

df:

    Name        Club
0   Cristiano   Real Madrid
1   L. Messi    FC Barcelona
2   Neymar      Paris Saint-Germain

Now add new column:

club_country_dict = {'Real Madrid': 'La Liga',
                     'FC Barcelona' : "La Liga",
                     'Paris Saint-Germain': 'France Ligue 1'}
df['League'] = df.Club.replace(club_country_dict)

df:

    Name        Club                   League
0   Cristiano   Real Madrid            La Liga
1   L. Messi    FC Barcelona           La Liga
2   Neymar      Paris Saint-Germain    France Ligue 1

Comments

1

To cope with the "list problem" in club_country_dict, convert it to the following Series:

league_club = pd.Series(club_country_dict.values(), index=club_country_dict.keys(),
    name='Club').explode()

The result is:

La Liga                   Real Madrid
La Liga                  FC Barcelona
France Ligue 1    Paris Saint-Germain
Name: Club, dtype: object

You should also have a "connection" between the league name and its country (another Series):

league_country = pd.Series({'La Liga': 'Spain', 'France Ligue 1': 'France'}, name='Country')

Of course, add here other leagues of interest with their countries.

The next step is to join them into club_details DataFrame, with Club as the index:

club_details = league_club.to_frame().join(league_country).reset_index()\
    .rename(columns={'index':'League'}).set_index('Club')

The result is:

                             League Country
Club                                       
Paris Saint-Germain  France Ligue 1  France
Real Madrid                 La Liga   Spain
FC Barcelona                La Liga   Spain

Then, assuming that your first DataFrame is named player, generate the final result:

result = player.join(club_details, on='Club')

The result is:

                Name                 Club          League Country
0  Cristiano Ronaldo          Real Madrid         La Liga   Spain
1           L. Messi         FC Barcelona         La Liga   Spain
2             Neymar  Paris Saint-Germain  France Ligue 1  France

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.