1

I am working with two dataframes. One containing the pair of unique keys and the other having values of those unique keys. There are almost 5.8million pairs.

Dataframe 1 - pair_df

key1 key2
a b
a c
b c
e f

Dataframe 2 - key_value_df

key value
a 432
b 654
c 874
e 014
f 421

I want a dataframe in such a way that in the dataframe 1, the respective values for both keys are present as additional columns.

Required Dataframe

key1 key2 value1 value2
a b 432 654
a c 432 874
b c 654 874
e f 014 421

I tried with the following code:

def find_vect(key_value_df, pair_df, key_name, vect_name):
    pair_df[vect_name]=''
    count=0
    for idx1, pic1 in enumerate(pair_df[key_name]):  # key_name='key1
        for idx2, pic2 in enumerate(key_value_df['key']):
            if pic1==pic2:
                vect = list(key_value_df.iloc[idx2, 0:49])
                pair_df.loc[idx1, vect_name] = vect
                count+=1
                if count%10000==0:
                    print(count)

This is a simplified version of the actual code and might contain some error. The logic is working but due to huge number of data, it is taking a lot of time for the process. Also same code has to be rerun for the other key in dataframe 1. This is making the process very time consuming.

Is there any other efficient way to solve the problem?

1
  • 2
    There are ready to use operations like df.merge or df.join. Don't reinvent the wheel Commented Jul 6, 2021 at 7:22

1 Answer 1

1

Use Series.map by Series created from key_value_df by set key to index:

s = key_value_df.set_index('key')['value']

pair_df['value1'] = pair_df['key1'].map(s)
pair_df['value2'] = pair_df['key2'].map(s)
Sign up to request clarification or add additional context in comments.

3 Comments

Hello @Jezrael, at first it seams that the solution works but actually it is creating list of list for the value. It is creating nested lists of same value.
@RajRajeshwariPrasad - So input data looks different like in question?
Hello @Jezrael, I reworked on the solution you provided and found that the problem was in the input. The input itself was a nested list. I figured out the problem and now the your solution works absolutely fine. Thanks a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.