1

I'm trying to find a way to merge in multiple columns at the same time with Pandas. I have the output I want by doing five separate merges, but it feels like there should be a more pythonic way to do it.

Essentially I have a dataframe with five keyword columns in a dataframe called df_striking which I'm trying to merge in search volume data from another dataframe (called df_keyword_vol) into adjacent rows.

Minimum Reproducible Example:

import pandas as pd

striking_data = {
    "KW1": ["nectarine", "apricot", "plum"],
    "KW1 Vol": ["", "", ""],
    "KW2": ["apple", "orange", "pear"],
    "KW2 Vol": ["", "", ""],
    "KW3": ["banana", "grapefruit", "cherry"],
    "KW3 Vol": ["", "", ""],
    "KW4": ["kiwi", "lemon", "peach"],
    "KW4 Vol": ["", "", ""],
    "KW5": ["raspberry", "blueberry", "berries"],
    "KW5 Vol": ["", "", ""],
}

df_striking = pd.DataFrame(striking_data)

keyword_vol_data = {
    "Keyword": [
        "nectarine",
        "apricot",
        "plum",
        "apple",
        "orange",
        "pear",
        "banana",
        "grapefruit",
        "cherry",
        "kiwi",
        "lemon",
        "peach",
        "raspberry",
        "blueberry",
        "berries",
    ],
    "Volume": [
        1000,
        500,
        200,
        600,
        800,
        1000,
        450,
        10,
        900,
        1200,
        150,
        700,
        400,
        850,
        1000,
    ],
}

df_keyword_vol = pd.DataFrame(keyword_vol_data)

Desired Output

enter image description here

What I've tried. I've made two functions to merge the keyword data a row a time, but it's just not very pythonic!

# two functions to merge in the keyword volume data for KWs 1 - 5
def merger(col1, col2):
    dx = df_striking.merge(df_keyword_vol, how='left', left_on=col1, right_on=col2)
    return dx

def volume(vol1, vol2):
    vol = df_striking[vol1] = df_striking[vol2]
    df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
    return vol

df_striking = merger("KW1", "Keyword")
volume("KW1 Vol", "Volume")
df_striking = merger("KW2", "Keyword")
volume("KW2 Vol", "Volume")
df_striking = merger("KW3", "Keyword")
volume("KW3 Vol", "Volume")
df_striking = merger("KW4", "Keyword")
volume("KW4 Vol", "Volume")
df_striking = merger("KW5", "Keyword")
volume("KW5 Vol", "Volume")
3
  • do you already have the empty column KWx Vol in your dataframe? Commented Sep 28, 2021 at 18:49
  • I do! They're ready to be populated. Commented Sep 28, 2021 at 19:02
  • 1
    @LeeRoy then you can use a simple replace on the odd columns, see my answer below Commented Sep 28, 2021 at 19:06

3 Answers 3

3

If you already have the empty columns, you can use:

mapping = df_keyword_vol.set_index('Keyword')['Volume']

df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)

Else, if you only have the KWx columns:

df2 = (pd.concat([df, df.replace(mapping)], axis=1)
         .sort_index(axis=1)
       )

output:

         KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000
Sign up to request clarification or add additional context in comments.

Comments

1

It’s easier if you transform it all to a long format:

>>> striking = df_striking.filter(regex='KW[0-9]*$').stack().rename('Keyword').reset_index()
>>> joined = striking.merge(df_keyword_vol)
>>> joined
  level_0 level_1     Keyword  Volume
0       0     KW1   nectarine    1000
1       0     KW2       apple     600
2       0     KW3      banana     450
3       0     KW4        kiwi    1200
4       0     KW5   raspberry     400
5       1     KW1     apricot     500
6       1     KW2      orange     800
7       1     KW3  grapefruit      10
8       1     KW4       lemon     150
9       1     KW5   blueberry     850
10      2     KW1        plum     200
11      2     KW2        pear    1000
12      2     KW3      cherry     900
13      2     KW4       peach     700
14      2     KW5     berries    1000

Then you can get the original format with .pivot, but with a multi-index as columns:

>>> joined.pivot('index', 'level_1', ['Keyword', 'Volume'])
           Keyword                                       Volume                       
level_1        KW1     KW2         KW3    KW4        KW5    KW1   KW2  KW3   KW4   KW5
index                                                                                 
0        nectarine   apple      banana   kiwi  raspberry   1000   600  450  1200   400
1          apricot  orange  grapefruit  lemon  blueberry    500   800   10   150   850
2             plum    pear      cherry  peach    berries    200  1000  900   700  1000

We can get around that weird format with a pd.concat:

>>> pd.concat([
...     joined.pivot('index', 'level_1', 'Keyword'),
...     joined.pivot('index', 'level_1', 'Volume').add_suffix(' Vol')
... ], axis='columns').sort_index(axis='columns')
level_1        KW1  KW1 Vol     KW2  KW2 Vol         KW3  KW3 Vol    KW4  KW4 Vol        KW5  KW5 Vol
index                                                                                                
0        nectarine     1000   apple      600      banana      450   kiwi     1200  raspberry      400
1          apricot      500  orange      800  grapefruit       10  lemon      150  blueberry      850
2             plum      200    pear     1000      cherry      900  peach      700    berries     1000

1 Comment

not sure if we have the same definition of easier :p That said, I was about to use this approach before realizing that one could only use the columns' parity +1
0
pd.concat([v.reset_index(drop=True).drop('col1',axis=1)
           for k,v in
           df_keyword_vol.assign(col1=df_keyword_vol.index//3)
          .groupby('col1')]
          ,axis=1)\
    .set_axis(df_striking.columns,axis=1)


    KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.