Python extract whole content of column from data frame using pandas

Question

I want to extract the whole content of a column from a multi-column data frame using pandas but I am getting only a part of the column.

The code I am using is:

import pandas
import csv
data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])

import sys  
sys.stdout = open("data2.csv", "w") 
print data

What I get is something like this:

       dbSNP RS ID
0        rs4147951
1        rs2022235
2        rs6425720
3       rs12997193
4        rs9933410
5        rs7142489
...            ...
934963  rs10262938
934964   rs6140985
934965   rs2704067
934966   rs2239441
934967  rs10041689

[934968 rows x 1 columns]

The first 2 lines of the csv file are:

"Probe Set ID","dbSNP RS ID","Chromosome","Physical Position","Strand","ChrX    pseudo-autosomal region 1","Cytoband","Flank","Allele A","Allele B","Associated Gene","Genetic Map","Microsatellite","Fragment Enzyme Type Length Start Stop","Allele Frequencies","Heterozygous Allele Frequencies","Number of individuals","In Hapmap","Strand Versus dbSNP","Copy Number Variation","Probe Count","ChrX pseudo-autosomal region 2","In Final List","Minor Allele","Minor Allele Frequency","% GC","OMIM"

"AFFX-   SNP_10000979","rs4147951","17","66943738","+","0","q24.2","GGATAAGGATGGGCTA[A/G]ATTATCATTGCTGTTA","A","G","ENST00000269080 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000428549 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000541225 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// ENST00000542396 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8 /// NM_007168 // intron // 0 // Hs.58351 // ABCA8 // 10351 // ATP-binding cassette, sub-family A (ABC1), member 8","99.8510 // D17S795 // D17S2182 // --- // --- // deCODE /// 90.7912 // D17S1870 // D17S840 // AFM323TB1 // AFM207VF4 // Marshfield /// 82.3131 // --- // D17S1786 // 147671 // --- // SLM1","D17S795 // downstream // 265562 /// D17S1474E // upstream // 113179","NspI // ACATGT_ACATGT // 536 // 66943408 // 66943943 /// StyI // CCTTGG_CCATGG // 2334 // 66941614 // 66943947","0.3917 // 0.6083 // CEU /// 0.6444 // 0.3556 // CHB /// 0.6000 // 0.4000 // JPT /// 0.5667 // 0.4333 // YRI","0.3833 // CEU /// 0.4889 // CHB /// 0.4444 // JPT /// 0.5667 // YRI","60 // CEU /// 45 // CHB /// 45 // JPT /// 60 // YRI","YES","reverse","---","6","0","YES","A // CEU /// G // CHB /// G // JPT /// G // YRI","0.3917 // CEU /// 0.3556 // CHB /// 0.4000 // JPT /// 0.4333 // YRI","---","---"

Any idea about how to extract the 'dbSNP RS ID' from the 934968 rows??. Thank you very much !

Hi @mirosval, I will edit the question and include the first two lines of the csv file — Lucas
– Lucas, Commented Dec 10, 2015 at 14:10
Hi @Fabio, some column values are strings, others are integers — Lucas
– Lucas, Commented Dec 10, 2015 at 14:14
Are you refering to the dots in the middle of your second code listing? If so it is just how pandas shows data, you don't want to print all of the 900k columns out... if you actually try to save them, they should all be there... — mirosval
– mirosval, Commented Dec 10, 2015 at 14:16

Fabio Lamanna · Accepted Answer · 2015-12-10 14:30:25Z

1

IIUC you should read and write again a .csv file with:

data = pandas.read_csv('data1.csv', usecols = ['dbSNP RS ID'])

data.to_csv('data2.csv')

The problem with your code is that the print function actually writes on file only the part of the file that pandas shows in the terminal prompt. When there are too much rows it splits the output adding ... in the middle.

edited Dec 10, 2015 at 14:30

answered Dec 10, 2015 at 14:21

Fabio Lamanna

21.7k24 gold badges95 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python extract whole content of column from data frame using pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related