Using for-loop to create a Pandas DataFrame (non-dictionary based)

Question

I have the following data space separated (mydata.txt):

sample1 probe1 gene1 3.23
sample1 probe1 gene2 1.20
sample2 probe1 gene1 2.20
sample2 probe2 gene1 0.12

What I want to do is to create a data frame that looks like this:

probe   gene    sample1 sample2
probe1  gene1   3.23     2.20
probe1  gene2   1.20     NA
probe2  gene1   NA       0.12

However, instead of transforming the data right after reading the CSV (e.g. via pandas.DataFrame.from_csv), I'd like to construct that data frame from the for-loop. I tried this but failed

#!/usr/bin/env python
import pandas as pd
import csv

infile = "mydata.txt"

alltups = []
with open(infile, 'r') as tsvfile:
    tabreader = csv.reader(tsvfile, delimiter=' ')
    for row in tabreader:
        sample, probe, gene, foldchange = row 
        tup = (sample, [probe,gene,foldchange])
        alltups.append(tup)

df = pd.DataFrame.from_items(alltups)
print df

Which produces:

  sample1 sample1 sample2 sample2
0  probe1  probe1  probe1  probe2
1   gene1   gene2   gene1   gene1
2    3.23    1.20    2.20    0.12

What's the right way to do it?

fixxxer · Accepted Answer · 2015-04-30 14:41:30Z

1

You can create temp with a for loop:

alltups = []
tabreader = csv.reader(open(infile, 'r'), delimiter='\t')
for row in tabreader:
        alltups.append(row)
## -- End pasted text --

   In [1280]: pd.DataFrame(alltups).rename(columns={0:'Sample',1:'Probe',2:'Gene',3:'Value'})
Out[1280]: 
    Sample   Probe   Gene Value
0  sample1  probe1  gene1  3.23
1  sample1  probe1  gene2  1.20
2  sample2  probe1  gene1  2.20
3  sample2  probe2  gene1  0.12

In [1287]: temp['Value'] = temp['Value'].astype(float)

or with temp = pd.read_csv('test.txt', sep='\t') which is used below: This is gotten from a simple pivot, if you are ok to not using the for-loop:

In [1239]: temp.pivot_table(index=['Probe','Gene'], columns='Sample',values='Value')
Out[1239]: 
Sample        sample1  sample2
Probe  Gene                   
probe1 gene1     3.23     2.20
       gene2     1.20      NaN
probe2 gene1      NaN     0.12

edited Apr 30, 2015 at 14:41

answered Apr 30, 2015 at 14:16

fixxxer

16.2k15 gold badges64 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

neversaint Over a year ago

No. I mean temp in your code temp.pivot_table. And finally I'd like to have the CSV file written like in OP. So the probe column need to have no 'holes'. How can I achieve that?

fixxxer Over a year ago

The holes in the probe column don't appear in the csv when you do a to_csv; they are fleshed out properly.

Alexander · Accepted Answer · 2015-04-30 16:05:58Z

0

I have no idea why you'd like to use a for loop. Isn't this a much simpler solution?

df = pd.read_csv('mydata.txt', 
                 sep=" ", 
                 index_col=[1, 2, 0], 
                 names=['sample', 'probe', 'gene', 'value']).unstack()

>>> df
               value        
sample       sample1 sample2
probe  gene                 
probe1 gene1    3.23    2.20
       gene2    1.20     NaN
probe2 gene1     NaN    0.12

answered Apr 30, 2015 at 16:05

Alexander

111k32 gold badges212 silver badges208 bronze badges

2 Comments

neversaint Over a year ago

because in my actual case mydata.txt is a data structure of another process.

Alexander Over a year ago

Can you first build a DataFrame and then transform/reshape it? If so, this approach will still work (you just don't need to read_csv).

Collectives™ on Stack Overflow

Using for-loop to create a Pandas DataFrame (non-dictionary based)

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related