1

I have two dataframes, df_1 and df_2

df_1 has 30k+ rows and looks like this

Col_1_1    Col_1_2    CA_CB
a          c          CA
a          c          CB
a          d          CA
b          c          CA
b          d          CB
b          d          CB
b          c          CA

I'd like to create two columns in df_1 using data coming from df_2 if column CA_CB = "CB"

df_2 has 1k row and looks like this (Col_2_1 has unique values)

Col_2_1    Col_2_2
a          data on a
b          data on b
c          data on c
d          data on d

My output should look like this:

Col_1_1    Col_1_2    CA_CB    Col_target_1    Col_target_2
a          c          CA       "X"             "X"
a          c          CB       data on a       data on c
a          d          CA       "X"             "X"
b          c          CA       "X"             "X"
b          d          CB       data on b       data on d
b          d          CB       data on b       data on d
b          c          CA       "X"             "X"

The way I'm doing it currently is creating Col_target_1 and Col_target_2 with

df_1["Col_target_1"] = "X"
df_2["Col_target_2"] = "X"

for i in range(len(df_1)):
    if df_1["CA_CB"][i] == "CB":
        for j in range(len(df_2)):
            if df_1["Col_1_1"][i] == df_2["Col_2_1"][j]:
                df_1["Col_target_1"][i] = df_2["Col_2_2"][j]
            if df_1["Col_1_2"][i] == df_2["Col_2_1"][j]:
                df_1["Col_target_2"][i] = df_2["Col_2_2"][j]

This is doing the job I want it to. But it is taking 20+ minutes to do so, and I was wondering if it could be run faster using another method.

Thank you in advance.

1 Answer 1

3

First create a series mapping from df_2:

s = df_2.set_index('Col_2_1')['Col_2_2']

Then map conditionally to df_1 using numpy.where:

mask = df_1['CA_CB'] == 'CB'

df_1['Col_target_1'] = np.where(mask, df_1['Col_1_1'].map(s), 'X')
df_1['Col_target_2'] = np.where(mask, df_1['Col_1_2'].map(s), 'X')

mask returns a Boolean series, which np.where uses to decide element-wise whether to select the second or third arguments.

Sign up to request clarification or add additional context in comments.

1 Comment

Works perfectly ! Thank you very much ! Takes less than 0.5 seconds !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.