I have the following data space separated (mydata.txt):
sample1 probe1 gene1 3.23
sample1 probe1 gene2 1.20
sample2 probe1 gene1 2.20
sample2 probe2 gene1 0.12
What I want to do is to create a data frame that looks like this:
probe gene sample1 sample2
probe1 gene1 3.23 2.20
probe1 gene2 1.20 NA
probe2 gene1 NA 0.12
However, instead of transforming the data right after reading the CSV (e.g. via pandas.DataFrame.from_csv), I'd like to construct that data frame from the for-loop. I tried this but failed
#!/usr/bin/env python
import pandas as pd
import csv
infile = "mydata.txt"
alltups = []
with open(infile, 'r') as tsvfile:
tabreader = csv.reader(tsvfile, delimiter=' ')
for row in tabreader:
sample, probe, gene, foldchange = row
tup = (sample, [probe,gene,foldchange])
alltups.append(tup)
df = pd.DataFrame.from_items(alltups)
print df
Which produces:
sample1 sample1 sample2 sample2
0 probe1 probe1 probe1 probe2
1 gene1 gene2 gene1 gene1
2 3.23 1.20 2.20 0.12
What's the right way to do it?