2

I have 4 CSV files with \t or tab as delimiter.

alok@alok-HP-Laptop-14s-cr1:~/tmp/krati$ for file in sample*.csv; do echo $file; cat $file; echo ; done
sample1.csv
ProbeID p_code  intensities
B1_1_3  6170    2
B2_1_3  6170    2.2
B3_1_4  6170    2.3
12345   6170    2.4
1234567 6170    2.5

sample2.csv
ProbeID p_code  intensities
B1_1_3  5320    3
B2_1_3  5320    3.2
B3_1_4  5320    3.3
12345   5320    3.4
1234567 5320    3.5

sample3.csv
ProbeID p_code  intensities
B1_1_3  1234    4
B2_1_3  1234    4.2
B3_1_4  1234    4.3
12345   1234    4.4
1234567 1234    4.5

sample4.csv
ProbeID p_code  intensities
B1_1_3  3120    5
B2_1_3  3120    5.2
B3_1_4  3120    5.3
12345   3120    5.4
1234567 3120    5.5

All 4 files have same headers.

ProbeID is same across all files, order is also same. Each file have same p_code across single CSV file.

I have to merge all these CSV files into one in this format.

alok@alok-HP-Laptop-14s-cr1:~/tmp/krati$ cat output1.csv 
ProbeID 6170    5320    1234    3120
B1_1_3  2       3       4       5
B2_1_3  2.2     3.2     4.2     5.2
B3_1_4  2.3     3.3     4.3     5.3
12345   2.4     3.4     4.4     5.4
1234567 2.5     3.5     4.5     5.5

In this output file columns are dynamic based on p_code value.

I can do this easily in Python using dictionary. How can I produce such output using Pandas?

1 Answer 1

2

We can achieve this using pandas.concat and DataFrame.pivot_table:

import os
import pandas as pd

df = pd.concat(
    [pd.read_csv(f, sep="\t") for f in os.listdir() if f.endswith(".csv") and f.startswith("sample")], 
    ignore_index=True
)

df = df.pivot_table(index="ProbeID", columns="p_code", values="intensities", aggfunc="sum")
print(df)
Sign up to request clarification or add additional context in comments.

5 Comments

It works on my laptop, hard to debug like this. Can you print df after the line with pd.concat and see if you have one dataframe with the three columns: ProbeID, p_code, intensities
Now it's confusing, please be concise, is it intensities or tintensities? It;s not about python or linux, the name of column is just wrong.
That's an important detail you did not provide, your seperator is a tab. See edit in pd.read_csv
sorry I have added that detail in question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.