0

I have a list of csv files which are in the same directory and trying to combine these 2 files and make one new csv file which has the contents of both input files. here is an example of 2 input files:

small_example1.csv

    CodeClass,Name,Accession,Count
    Endogenous,CCNO,NM_021147.4,18
    Endogenous,MYC,NM_002467.3,1114
    Endogenous,CD79A,NM_001783.3,178
    Endogenous,FSTL3,NM_005860.2,529

small_example2.csv

    CodeClass,Name,Accession,Count
    Endogenous,CCNO,NM_021147.4,196
    Endogenous,MYC,NM_002467.3,962
    Endogenous,CD79A,NM_001783.3,390
    Endogenous,FSTL3,NM_005860.2,67

and here is the expected output file (result.csv):

    Probe_Name,Accession,Class_Name,small_example1,small_example2
    CCNO,NM_021147.4,Endogenous,18,196
    MYC,NM_002467.3,Endogenous,1114,962
    CD79A,NM_001783.3,Endogenous,178,390
    FSTL3,NM_005860.2,Endogenous,529,67

to do so, I made this function in python3:

    import pandas as pd
    filenames = ['small_example1.csv', 'small_example2.csv']
    path = '/home/Joy'
    def convert(filenames):
        for file in filenames:
            df1 = pd.read_csv(file, skiprows=26, skipfooter=5, sep=',')
            df = df1.merge(df2, on=['CodeClass', 'Name', 'Accession'])
            df = df.rename(columns={'Name': 'Probe_Name',
                            'CodeClass': 'Class_Name',
                             file: file})
            df.to_csv('result.csv')

the results look like this and the last 2 columns are not like expected (both headers and numbers).

        Class_Name  Probe_Name  Accession   Count_x Count_y
    0   Endogenous  CCNO    NM_021147.4 18  18
    1   Endogenous  MYC NM_002467.3 1114    1114
    2   Endogenous  CD79A   NM_001783.3 178 178
    3   Endogenous  FSTL3   NM_005860.2 529 529

do you know how to fix the problem?

2 Answers 2

1

I propose you to first load your dataframes and store them in a list, and then merge them all together (with an inner or outer join, according to your needs) :

import pandas as pd
from functools import reduce

filenames = ['small_example1.csv', 'small_example2.csv']
path = '/home/Joy'

def convert(filenames):
    dataframes = []

    # load all the dataframes in a list (dataframes)
    for filename in filenames:
        df = pd.read_csv(filename, skiprows=26, skipfooter=5, sep=',')
        df = df.rename(columns={'Count': filename})
        dataframes.append(df)

    # merge the dataframes
    df_merged = reduce(lambda x,y: pd.merge(x,y, on=['CodeClass', 'Name', 'Accession'], how='outer'), dataframes)

    # rename the columns as you want and export the result
    df_merged = df_merged.rename(columns={'Name': 'Probe_Name', 'CodeClass': 'Class_Name'})
    df_merged.to_csv('result.csv')
Sign up to request clarification or add additional context in comments.

Comments

0

You have two problems here, the headers and the values.

If you get twice the same value, you have read twice the same file. You should rename the Count column at load time and merge the dataframes into a final one:

import pandas as pd
filenames = ['small_example1.csv', 'small_example2.csv']
path = '/home/Joy'
def convert(filenames):
    df = None               # initialize the merged dataframe to None
    for file in d:
        # load a new dataframe and rename its Count columns
        df1 = pd.read_csv(io.StringIO(d[file])).rename(columns={'Count': file})
        # merge it into df
        if df is None:
            df = df1
        else:
            df = df.merge(df1, on=['CodeClass', 'Name', 'Accession'])
    # rename and reindex the columns
    result = df.rename(columns={'Name': 'Probe_Name', 'CodeClass': 'Class_Name'}
                       ).reindex(['Probe_Name','Accession','Class_Name']+filenames,
                                 axis=1)
    result.to_csv('result.csv', index=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.