I am trying merge specific strings in a pandas df. The df below is just an example. The values in my df will differ but the basic rules will apply. I basically want to merge all rows until there's a 4 letter string.
Whilst the 4 letter string in this df is always Excl, my df will contain numerous 4 letter strings.
import pandas as pd
d = ({
'A' : ['Include','Inclu','Incl','Inc'],
'B' : ['Excl','de','ude','l'],
'C' : ['X','Excl','Excl','ude'],
'D' : ['','Y','ABC','Excl'],
})
df = pd.DataFrame(data=d)
Out:
A B C D
0 Include Excl X
1 Inclu de Excl Y
2 Incl ude Excl ABC
3 Inc l ude Excl
Intended Output:
A B C D
0 Include Excl X
1 Include Excl Y
2 Include Excl ABC
3 Include Excl
So row 0 stays the same as col B has 4 letters. Row 1 merges Col A,B as Col C 4 letters. Row 2 stays the same as above. Row 3 merges Col A,B,C as Col D has 4 letters.
I have tried to do this manually by merging all columns and then go back and removing unwanted values.
df["Com"] = df["A"].map(str) + df["B"] + df["C"]
But I would have to manually go through each row and remove different lengths of letters.
The above df is just an example. The central similarity is I need to merge everything before the 4 letter string.