Append column value if string is contained in another string

Question

I want to add a new column a3 to my dataframe df: If the strings of "b" contain strings of "b2" from dataframe df2, the new column a3 should append values from a2 of df2.

first dataframe df:


d = {'a': [100, 300], 'b': ["abc", "dfg"]}
df = pd.DataFrame(data=d, index=[1, 2])

print(df)
     a    b
1  100  abc
2  300  dfg

second dataframe df2:


d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}
df2 = pd.DataFrame(data=d2, index=[1, 2, 3])

print(df2)
   a2  b2
1  L1  bc
2  L2  op
3  L3  fg

The output should look like this:

print(df)
     a    b   a3
1  100  abc   L1
2  300  dfg   L3

I tried a nested for loop, which did not work.

for i in df.b:
   for ii in df2.b2:
       for iii in df2.a3:
           if ii in i:
              df["a3"]=iii

I think you wanted d2 to be something like :{'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}, right? — TheFaultInOurStars
– TheFaultInOurStars, Commented Mar 18, 2022 at 9:53

mozway · Accepted Answer · 2022-03-18 09:59:12Z

2

You need to test all combinations. You could still take advantage of pandas vector str.contains:

common = (pd.DataFrame({x: df['b'].str.contains(x) for x in df2['b2']})
   .replace({False: pd.NA})
   .stack()
   .reset_index(level=1, name='b2')['level_1'].rename('b2')
)
# 1    bc
# 2    fg
# Name: b2, dtype: object

df.join(common).merge(df2, on='b2')

output:

     a    b  b2  a2
0  100  abc  bc  L1
1  300  dfg  fg  L3

answered Mar 18, 2022 at 9:59

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Fred Shone · Accepted Answer · 2022-03-18 10:33:21Z

2

You can half fix your logic as follows:

for i in df.b:
    for ii, iii in zip(df2.b2, df2.a2):
        if ii in i:
            df["a3"]=iii

However the final line df["a3"] = iii is assigning iii to every row so you just get the last value for iii in the loop for every row:

    a       b       a3
1   100     abc     L3
2   300     dfg     L3

You will get many 'correct' options, but one that is closest to your attempt is perhaps:

new_column = [None] * len(df) # create list of Nones same 'height' as df

for i, b in enumerate(df.b):
    for a2, b2 in zip(df2.a2, df2.b2):
        if b2 in b:
            new_column[i] = a2
            continue # this moves us on to next 'row' in df
            
df["a3"] = new_column

A difference from your attempt is that this builds the 'new_column' separately and then adds to your dataframe after. In the case where there is no match you will be left with None. In the case of multiple matches, you will get the first (top) match. You could remove the continue line to instead get the last (bottom) match.

answered Mar 18, 2022 at 10:33

Fred Shone

313 bronze badges

2 Comments

Limmi Over a year ago

Thanks a lot, your answer and the explanation, it worked!

Fred Shone Over a year ago

awesome (this was my first ever answer on gh) - worth saying that the pandas based answers and list comprehension would likely be faster. But if speed isn't an issue then it's nice to create something understandable. Plus easier to control edge cases like no or multiple matches.

TheFaultInOurStars · Accepted Answer · 2022-03-18 11:06:28Z

1

Among a lof of approaches, you can use list comprehension:

df["a2"] = [df2.iloc[i]["a2"] for y in df.b for i,x in enumerate(df2.b2) if x in y]
df

Output

	a	b	a2
1	100	abc	L1
2	300	dfg	L3

And note that, it shouldn't be d2 = {'a2': [10, 30, 25], 'b2': ["bc", "op", "fg"]}, rather it should be d2 = {'a2': ["L1", "L2", "L3"], 'b2': ["bc", "op", "fg"]}.

edited Mar 18, 2022 at 11:06

answered Mar 18, 2022 at 9:51

TheFaultInOurStars

3,6331 gold badge13 silver badges30 bronze badges

3 Comments

Limmi Over a year ago

You are right about d2, sorry about that. However, using your code on my actual dataset gives me the following error: ValueError: Length of values (8) does not match length of index (44)

TheFaultInOurStars Over a year ago

@Limmick Yes, I forgot to add an else part to the list comprehension, and I edited the answer as per your comment. However mozway's answer seems fine and fit to what you needed.

Limmi Over a year ago

Thanks a lot for your answer! However I get a invalid syntax error at else

Collectives™ on Stack Overflow

Append column value if string is contained in another string

3 Answers 3

Comments

2 Comments

Output

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Output

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related