Hello I have a dataframe such as
>>> tab
COL1 COL2 COL3 COL4 COL5
0 G1 S_-__1Canis_lupus A B SEQ1
1 G1 S_+__2Elpah_bis C D SEQ4.1
2 G1 S_-__3Felis_cattus NaN NaN SEQA.B
3 G1 S_-__4Felis_cattus NaN NaN SEQA.B
4 G1 S-BICs_-__5Felis_cattus E F SEQA.A
5 G1 S_+__6Felis_cattus NaN NaN SEQA.A
6 G1 S_-__7Felis_cattus NaN NaN SEQA.A
7 G1 S-BICs_-__8Felis_cattus L P SEQA.B
8 G1 S_-__9Felis_cattus K L SEQA.A
9 G2 S_+__10Felis_cattus M N SEQA.A
10 G2 S_-__11Lupus_lupus NaN NaN SEQ3
and the idea is within each COL1 groups to focus on value in COL2 that contain the pattern : -BICs
then fill all COL3 and COL4 values that are NaN with the same COL5 value as the one that contain the -BICs pattern
exemple :
in line4 S-BICs_-__5Felis_cattus has a -BICspattern, its COL5 = SEQA.A
Within G1
S_-__3Felis_cattus and S_-__4Felis_cattus have NaN values in COL3 and COL4 and have the same COL5 value. Then I put the COL3 and COL4 values of S-BICs_-__5Felis_cattus
:
>>> tab
COL1 COL2 COL3 COL4 COL5
0 G1 S_-__1Canis_lupus A B SEQ1
1 G1 S_+__2Elpah_bis C D SEQ4.1
2 G1 S_-__3Felis_cattus NaN NaN SEQA.B
3 G1 S_-__4Felis_cattus NaN NaN SEQA.B
4 G1 S-BICs_-__5Felis_cattus E F SEQA.A
5 G1 S_+__6Felis_cattus E F SEQA.A
6 G1 S_-__7Felis_cattus E F SEQA.A
7 G1 S-BICs_-__8Felis_cattus L P SEQA.B
8 G1 S_-__9Felis_cattus K L SEQA.A
9 G2 S_+__10Felis_cattus M N SEQA.A
10 G2 S_-__11Lupus_lupus NaN NaN SEQ3
and same for the S-BICs_-__8Felis_cattus where NaN are transformed toLandP`
>>> tab
COL1 COL2 COL3 COL4 COL5
0 G1 S_-__1Canis_lupus A B SEQ1
1 G1 S_+__2Elpah_bis C D SEQ4.1
2 G1 S_-__3Felis_cattus L P SEQA.B
3 G1 S_-__4Felis_cattus L P SEQA.B
4 G1 S-BICs_-__5Felis_cattus E F SEQA.A
5 G1 S_+__6Felis_cattus E F SEQA.A
6 G1 S_-__7Felis_cattus E F SEQA.A
7 G1 S-BICs_-__8Felis_cattus L P SEQA.B
8 G1 S_-__9Felis_cattus K L SEQA.A
9 G2 S_+__10Felis_cattus M N SEQA.A
10 G2 S_-__11Lupus_lupus NaN NaN SEQ3