Merge multiple CSV files using Pandas to create final CSV file with dynamic header

Question

I have 4 CSV files with \t or tab as delimiter.

alok@alok-HP-Laptop-14s-cr1:~/tmp/krati$ for file in sample*.csv; do echo $file; cat $file; echo ; done
sample1.csv
ProbeID p_code  intensities
B1_1_3  6170    2
B2_1_3  6170    2.2
B3_1_4  6170    2.3
12345   6170    2.4
1234567 6170    2.5

sample2.csv
ProbeID p_code  intensities
B1_1_3  5320    3
B2_1_3  5320    3.2
B3_1_4  5320    3.3
12345   5320    3.4
1234567 5320    3.5

sample3.csv
ProbeID p_code  intensities
B1_1_3  1234    4
B2_1_3  1234    4.2
B3_1_4  1234    4.3
12345   1234    4.4
1234567 1234    4.5

sample4.csv
ProbeID p_code  intensities
B1_1_3  3120    5
B2_1_3  3120    5.2
B3_1_4  3120    5.3
12345   3120    5.4
1234567 3120    5.5

All 4 files have same headers.

ProbeID is same across all files, order is also same. Each file have same p_code across single CSV file.

I have to merge all these CSV files into one in this format.

alok@alok-HP-Laptop-14s-cr1:~/tmp/krati$ cat output1.csv 
ProbeID 6170    5320    1234    3120
B1_1_3  2       3       4       5
B2_1_3  2.2     3.2     4.2     5.2
B3_1_4  2.3     3.3     4.3     5.3
12345   2.4     3.4     4.4     5.4
1234567 2.5     3.5     4.5     5.5

In this output file columns are dynamic based on p_code value.

I can do this easily in Python using dictionary. How can I produce such output using Pandas?

Erfan · Accepted Answer · 2020-09-09 17:24:04Z

2

We can achieve this using pandas.concat and DataFrame.pivot_table:

import os
import pandas as pd

df = pd.concat(
    [pd.read_csv(f, sep="\t") for f in os.listdir() if f.endswith(".csv") and f.startswith("sample")], 
    ignore_index=True
)

df = df.pivot_table(index="ProbeID", columns="p_code", values="intensities", aggfunc="sum")
print(df)

edited Sep 9, 2020 at 17:24

answered Sep 9, 2020 at 17:06

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Erfan Over a year ago

It works on my laptop, hard to debug like this. Can you print df after the line with pd.concat and see if you have one dataframe with the three columns: ProbeID, p_code, intensities

Erfan Over a year ago

Now it's confusing, please be concise, is it intensities or tintensities? It;s not about python or linux, the name of column is just wrong.

Erfan Over a year ago

That's an important detail you did not provide, your seperator is a tab. See edit in pd.read_csv

Alok Over a year ago

sorry I have added that detail in question

Erfan Over a year ago

Let us continue this discussion in chat.

Collectives™ on Stack Overflow

Merge multiple CSV files using Pandas to create final CSV file with dynamic header

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related