I have two data frames:
test1 = pd.DataFrame({'Gene':['WASH7P', 'WASH7P', 'VCZ'], 'TPM':[10.034, 0.234000, 2.345]})
test2 = pd.DataFrame({'Gene':['WASH7P', 'WASH7P', 'btt'], 'TPM':[1.12345, 2.300, 0.00000]})
I would like to merge them into a single data frame. I have tried:
df = pd.merge(test1,test2, on = ['Gene'],how = 'outer')
resulting in:
Gene TPM_x TPM_y
0 WASH7P 10.034 1.12345
1 WASH7P 10.034 2.30000
2 WASH7P 0.234 1.12345
3 WASH7P 0.234 2.30000
4 VCZ 2.345 NaN
5 btt NaN 0.00000
However, there are row duplicates. I have tried drop_duplicates() but this does not work. The real data frames are much larger with > 30,000 rows.
The desired output:
Gene TPM_x TPM_y
WASH7P 10.034 1.12345
WASH7P 0.234 2.30000
VCZ 2.345 NaN
btt NaN 0.00000
Any help would be great.