I have 2 dataframes as following:
dfa = pd.DataFrame(['AA', 'BB', 'CC'], columns=list('A'))
dfb = pd.DataFrame(['AC', 'BC', 'CC'], columns=list('B'))
And my output is to generate a new dataframe with column B in dfb and another column of distance(e.g. Hamming distance from AC to AA is 1) between every element from B to A, like this:
B disB disB disB
0 AC 1 2 1
1 BC 2 1 1
2 CC 2 2 0
The codes I have tried like this (courtesy of other posts):
dfa = pd.DataFrame(['AA', 'BB', 'CC'], columns=list('A'))
dfb = pd.DataFrame(['AC', 'BC', 'CC'], columns=list('B'))
df_summary = dfb.copy()
for seq1 in dfa.A:
df__ = []
for seq2 in dfb.B:
hd = sum(c1 != c2 for c1, c2 in zip(seq1, seq2))
df__.append(hd)
df_summary['dis_{}'.format(column)] = pd.DataFrame({'dis_' + column: df__}).values
print(df_summary)
The result will give me 3 outputs:
B dis_B
0 AC 1
1 BC 2
2 CC 2
B dis_B
0 AC 2
1 BC 1
2 CC 2
B dis_B
0 AC 1
1 BC 1
2 CC 0
but I need to combine them into one, like:
B disB disB disB
0 AC 1 2 1
1 BC 2 1 1
2 CC 2 2 0
Thanks for your help!