2

I have two dataframes: df1 and df2. If the Display Name from df2 is in df1's Display Name column, I want to assign df1s Type, Format, Behavior, Datatype values to df2s values.

I've tried merge every way I could think of. I think loc is my best hope but I can't seem to get the assignment syntax right. Also, I'm looking for a succinct answer - preferably a one-liner.

Something like this:

df2.loc[df2['Display Name'].isin(df1['Display Name']), /
        ['Type', 'Format', 'Behavior', 'Datatype']] = ???

My Code:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(
  {'Behavior': ['Attribute', 'Attribute', 'Attribute', 'Attribute', 'Attribute',
              'Attribute', 'Attribute', 'Metric', 'Metric', 'Metric', 'Metric',
              'Metric', 'Metric', 'Metric', 'Metric'],
   'Datatype': ['object', 'object', 'object', 'object', 'object', 'object',
              'object', 'int64', 'int64', 'int64', 'int64', 'float64',
              'float64', 'float64', 'float64'],
   'Display Name': ['Campaign', 'Campaign ID', 'Campaign ID', 'Campaign state',
                  'Campaign state', 'Currency', 'Currency', 'Impressions',
                  'Impressions', 'Clicks', 'Clicks', 'CTR', 'CTR', 'Avg. CPC',
                  'Avg. CPC'],
   'Format': ['{}', '{}', '{}', '{}', '{}', '{}', '{}', '{:,.0f}', '{:,.0f}',
            '{:,.0f}', '{:,.0f}', '{:.2f}%', '{:.2f}%', '${:,.2f}', '${:,.2f}'],
   'Type': ['String', 'String', 'String', 'String', 'String', 'String', 'String',
          'Integer', 'Integer', 'Integer', 'Integer', 'Percent', 'Percent',
          'Currency', 'Currency']},
  columns=['Display Name', 'Type', 'Format', 'Behavior', 'Datatype'])

df2 = pd.DataFrame(
  { 'Behavior': [ 'Attribute', 'Metric', 'Metric', 'Metric', 'Attribute',
                'Metric', 'Metric', 'Attribute', 'Metric', 'Metric', 'Metric'],
  'Datatype': [ 'object', 'float64', 'float64', 'float64', 'object', 'int64',
                'int64', 'object', 'float64', 'float64', 'float64'],
  'Display Name': [ 'Match type', 'Destination URL', 'Final URL',
                    'Mobile final URL', 'Labels', 'Impressions', 'Clicks',
                    'CTR', 'Avg. CPC', 'Cost', 'Avg. position'],
  'Format': [ '{}', '{:.2f}', '{:.2f}', '{:.2f}', '{}', '{:,.0f}', '{:,.0f}',
              '{}', '{:.2f}', '{:.2f}', '{:.2f}'],
  'Type': [ 'String', 'Float', 'Float', 'Float', 'String', 'Integer',
            'Integer', 'String', 'Float', 'Float', 'Float']},
  columns=['Display Name', 'Type', 'Format', 'Behavior', 'Datatype'])

df2_vals_in_df1 = df2.loc[df2['Display Name'].isin(df1['Display Name']), df2.columns[:]]
df1_vals_in_df2 = df1.loc[df1['Display Name'].isin(df2['Display Name']), df1.columns[:]]

What it looks like:

>>> df1
      Display Name      Type    Format   Behavior Datatype
0         Campaign    String        {}  Attribute   object
1      Campaign ID    String        {}  Attribute   object
2      Campaign ID    String        {}  Attribute   object
3   Campaign state    String        {}  Attribute   object
4   Campaign state    String        {}  Attribute   object
5         Currency    String        {}  Attribute   object
6         Currency    String        {}  Attribute   object
7      Impressions   Integer   {:,.0f}     Metric    int64
8      Impressions   Integer   {:,.0f}     Metric    int64
9           Clicks   Integer   {:,.0f}     Metric    int64
10          Clicks   Integer   {:,.0f}     Metric    int64
11             CTR   Percent   {:.2f}%     Metric  float64
12             CTR   Percent   {:.2f}%     Metric  float64
13        Avg. CPC  Currency  ${:,.2f}     Metric  float64
14        Avg. CPC  Currency  ${:,.2f}     Metric  float64

>>> df2
        Display Name     Type   Format   Behavior Datatype
0         Match type   String       {}  Attribute   object
1    Destination URL    Float   {:.2f}     Metric  float64
2          Final URL    Float   {:.2f}     Metric  float64
3   Mobile final URL    Float   {:.2f}     Metric  float64
4             Labels   String       {}  Attribute   object
5        Impressions  Integer  {:,.0f}     Metric    int64
6             Clicks  Integer  {:,.0f}     Metric    int64
7                CTR   String       {}  Attribute   object
8           Avg. CPC    Float   {:.2f}     Metric  float64
9               Cost    Float   {:.2f}     Metric  float64
10     Avg. position    Float   {:.2f}     Metric  float64

>>> df2_vals_in_df1
  Display Name     Type   Format   Behavior Datatype
5  Impressions  Integer  {:,.0f}     Metric    int64
6       Clicks  Integer  {:,.0f}     Metric    int64
7          CTR   String       {}  Attribute   object
8     Avg. CPC    Float   {:.2f}     Metric  float64

>>> df1_vals_in_df2
   Display Name      Type    Format Behavior Datatype
7   Impressions   Integer   {:,.0f}   Metric    int64
8   Impressions   Integer   {:,.0f}   Metric    int64
9        Clicks   Integer   {:,.0f}   Metric    int64
10       Clicks   Integer   {:,.0f}   Metric    int64
11          CTR   Percent   {:.2f}%   Metric  float64
12          CTR   Percent   {:.2f}%   Metric  float64
13     Avg. CPC  Currency  ${:,.2f}   Metric  float64
14     Avg. CPC  Currency  ${:,.2f}   Metric  float64

Note how df1_vals_in_df2 Display Name might have the same name multiple times. Their Type, Format, Behavior, Datatype values will always be the same values in both rows.

Expected Output of df2:

>>> df2
        Display Name     Type   Format   Behavior Datatype
0         Match type   String       {}  Attribute   object
1    Destination URL    Float   {:.2f}     Metric  float64
2          Final URL    Float   {:.2f}     Metric  float64
3   Mobile final URL    Float   {:.2f}     Metric  float64
4             Labels   String       {}  Attribute   object
5        Impressions  Integer  {:,.0f}     Metric    int64 <-- same
6             Clicks  Integer  {:,.0f}     Metric    int64 <-- same
7                CTR  Percent  {:.2f}%     Metric  float64 <-- changed
8           Avg. CPC Currency ${:,.2f}     Metric  float64 <-- changed
9               Cost    Float   {:.2f}     Metric  float64
10     Avg. position    Float   {:.2f}     Metric  float64

Takeaway #1: rows 5, 6 are the same because they are the same in both df1 and df2.

Takeaway #2: row 7, changed from String, {}, Attribute, object to Percent, {:.2f}%, Metric, float64 - the row values from df1 because Display Name from df2 was found in Display Name in df1.

Takeaway #3: row 8, changed for the same reasons noted in takeaway #2.

Tried:

Q1:Python Pandas: Merge or Filter DataFrame by Another. Is there a Better Way?

Doesn't address this question because I'm not trying to create a new dataframe; I'm trying to replace values in an existing dataframe from another.

Q2:Replace column values based on another dataframe python pandas - better way?

Doesn't address this question because that example contains one df with correct values whereas my situation is a df with correct and incorrect values.

Apologies that this is such a long question. I just wanted to provide enough context.

2
  • 2
    Well, you could make this much easier to read by shrinking df1 and df2 substantially. 5 rows and 2 columns would get the point across just as easily. All that said, are you just looking for combine_first()? Commented Mar 18, 2016 at 21:22
  • Thanks for the feedback. It probably was TMI (too much info). I did find a solution with your suggestion combine_first() which I'll post. However, I'm not in love with it because because it appends rows which then require deduplication. Second, in order to get the expected output above, I need to do a wizardy custom mapping (gotta be an easier way). Commented Mar 19, 2016 at 3:15

2 Answers 2

2

I think that combine_first will be an elegant solution, as per JohnE, provided you set Display Name as an index. This brings me to another point. I think that your task is well-defined only if 'Display Name' corresponds to exactly one set of attributes within each table. Assuming that, you can drop duplicates, set index and use .update like so:

df1 = df1.drop_duplicates()

df1 = df1.set_index('Display Name')
df2 = df2.set_index('Display Name')

df2_c = df2.copy()

df2.update(df1)
df1.update(df2_c)

del df2_c

You can reset the dimensions of df1 with an auxiliary index if you like.

Sign up to request clarification or add additional context in comments.

3 Comments

In the documentation for update(), it say 'Aligns on indices'. Is that why you set the index to be 'Display Name'?
Yes, exactly that.
Accepting this answer as correct. However, this answer is very similar to my Q1 I referenced above. And, nobody else (except myself) provided an answer. So... best answer.
0

Not ideal but I was able to recreate my expected output. Problem is, I wanted to avoid making df3 and want to do the replacement all within df2 so this isn't ideal.

df2 before:

        Display Name     Type   Format   Behavior Datatype
0         Match type   String       {}  Attribute   object
1    Destination URL    Float   {:.2f}     Metric  float64
2          Final URL    Float   {:.2f}     Metric  float64
3   Mobile final URL    Float   {:.2f}     Metric  float64
4             Labels   String       {}  Attribute   object
5        Impressions  Integer  {:,.0f}     Metric    int64
6             Clicks  Integer  {:,.0f}     Metric    int64
7                CTR   String       {}  Attribute   object
8           Avg. CPC    Float   {:.2f}     Metric  float64
9               Cost    Float   {:.2f}     Metric  float64
10     Avg. position    Float   {:.2f}     Metric  float64

df3 after:

df3 = df2.combine_first(df1).drop_duplicates('Display Name', keep='last')
df3 = df3.set_index(df3['Display Name'].map(dict(zip(df2['Display Name'], df2.index)))).sort_index().reset_index(drop=True)

        Display Name      Type    Format   Behavior Datatype
0         Match type    String        {}  Attribute   object
1    Destination URL     Float    {:.2f}     Metric  float64
2          Final URL     Float    {:.2f}     Metric  float64
3   Mobile final URL     Float    {:.2f}     Metric  float64
4             Labels    String        {}  Attribute   object
5        Impressions   Integer   {:,.0f}     Metric    int64
6             Clicks   Integer   {:,.0f}     Metric    int64
7                CTR   Percent   {:.2f}%     Metric  float64
8           Avg. CPC  Currency  ${:,.2f}     Metric  float64
9               Cost     Float    {:.2f}     Metric  float64
10     Avg. position     Float    {:.2f}     Metric  float64

df2 before and after comparison:

   Display Name   Type Format Behavior Datatype
0          True   True   True     True     True
1          True   True   True     True     True
2          True   True   True     True     True
3          True   True   True     True     True
4          True   True   True     True     True
5          True   True   True     True     True
6          True   True   True     True     True
7          True  False  False    False    False
8          True  False  False     True     True
9          True   True   True     True     True
10         True   True   True     True     True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.