I have a list of csv files which are in the same directory and trying to combine these 2 files and make one new csv file which has the contents of both input files. here is an example of 2 input files:
small_example1.csv
CodeClass,Name,Accession,Count
Endogenous,CCNO,NM_021147.4,18
Endogenous,MYC,NM_002467.3,1114
Endogenous,CD79A,NM_001783.3,178
Endogenous,FSTL3,NM_005860.2,529
small_example2.csv
CodeClass,Name,Accession,Count
Endogenous,CCNO,NM_021147.4,196
Endogenous,MYC,NM_002467.3,962
Endogenous,CD79A,NM_001783.3,390
Endogenous,FSTL3,NM_005860.2,67
and here is the expected output file (result.csv):
Probe_Name,Accession,Class_Name,small_example1,small_example2
CCNO,NM_021147.4,Endogenous,18,196
MYC,NM_002467.3,Endogenous,1114,962
CD79A,NM_001783.3,Endogenous,178,390
FSTL3,NM_005860.2,Endogenous,529,67
to do so, I made this function in python3:
import pandas as pd
filenames = ['small_example1.csv', 'small_example2.csv']
path = '/home/Joy'
def convert(filenames):
for file in filenames:
df1 = pd.read_csv(file, skiprows=26, skipfooter=5, sep=',')
df = df1.merge(df2, on=['CodeClass', 'Name', 'Accession'])
df = df.rename(columns={'Name': 'Probe_Name',
'CodeClass': 'Class_Name',
file: file})
df.to_csv('result.csv')
the results look like this and the last 2 columns are not like expected (both headers and numbers).
Class_Name Probe_Name Accession Count_x Count_y
0 Endogenous CCNO NM_021147.4 18 18
1 Endogenous MYC NM_002467.3 1114 1114
2 Endogenous CD79A NM_001783.3 178 178
3 Endogenous FSTL3 NM_005860.2 529 529
do you know how to fix the problem?