2

I want to add a new column a3 to my dataframe df: If the strings of "b" contain strings of "b2" from dataframe df2, the new column a3 should append values from a2 of df2.

first dataframe df:


d = {'a': [100, 300], 'b': ["abc", "dfg"]}
df = pd.DataFrame(data=d, index=[1, 2])

print(df)
     a    b
1  100  abc
2  300  dfg

second dataframe df2:


d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}
df2 = pd.DataFrame(data=d2, index=[1, 2, 3])

print(df2)
   a2  b2
1  L1  bc
2  L2  op
3  L3  fg

The output should look like this:

print(df)
     a    b   a3
1  100  abc   L1
2  300  dfg   L3

I tried a nested for loop, which did not work.

for i in df.b:
   for ii in df2.b2:
       for iii in df2.a3:
           if ii in i:
              df["a3"]=iii
2
  • What have you tried so far? Commented Mar 18, 2022 at 9:46
  • I think you wanted d2 to be something like :{'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}, right? Commented Mar 18, 2022 at 9:53

3 Answers 3

2

You need to test all combinations. You could still take advantage of pandas vector str.contains:

common = (pd.DataFrame({x: df['b'].str.contains(x) for x in df2['b2']})
   .replace({False: pd.NA})
   .stack()
   .reset_index(level=1, name='b2')['level_1'].rename('b2')
)
# 1    bc
# 2    fg
# Name: b2, dtype: object

df.join(common).merge(df2, on='b2')

output:

     a    b  b2  a2
0  100  abc  bc  L1
1  300  dfg  fg  L3
Sign up to request clarification or add additional context in comments.

Comments

2

You can half fix your logic as follows:

for i in df.b:
    for ii, iii in zip(df2.b2, df2.a2):
        if ii in i:
            df["a3"]=iii

However the final line df["a3"] = iii is assigning iii to every row so you just get the last value for iii in the loop for every row:

    a       b       a3
1   100     abc     L3
2   300     dfg     L3

You will get many 'correct' options, but one that is closest to your attempt is perhaps:

new_column = [None] * len(df) # create list of Nones same 'height' as df

for i, b in enumerate(df.b):
    for a2, b2 in zip(df2.a2, df2.b2):
        if b2 in b:
            new_column[i] = a2
            continue # this moves us on to next 'row' in df
            
df["a3"] = new_column

A difference from your attempt is that this builds the 'new_column' separately and then adds to your dataframe after. In the case where there is no match you will be left with None. In the case of multiple matches, you will get the first (top) match. You could remove the continue line to instead get the last (bottom) match.

2 Comments

Thanks a lot, your answer and the explanation, it worked!
awesome (this was my first ever answer on gh) - worth saying that the pandas based answers and list comprehension would likely be faster. But if speed isn't an issue then it's nice to create something understandable. Plus easier to control edge cases like no or multiple matches.
1

Among a lof of approaches, you can use list comprehension:

df["a2"] = [df2.iloc[i]["a2"] for y in df.b for i,x in enumerate(df2.b2) if x in y]
df

Output

a b a2
1 100 abc L1
2 300 dfg L3

And note that, it shouldn't be d2 = {'a2': [10, 30, 25], 'b2': ["bc", "op", "fg"]}, rather it should be d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}.

3 Comments

You are right about d2, sorry about that. However, using your code on my actual dataset gives me the following error: ValueError: Length of values (8) does not match length of index (44)
@Limmick Yes, I forgot to add an else part to the list comprehension, and I edited the answer as per your comment. However mozway's answer seems fine and fit to what you needed.
Thanks a lot for your answer! However I get a invalid syntax error at else

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.