Storing each column in a separate dictionary using python

Question

Is there an efficient way to store each column of a tab-delimited file in a separate dictionary using python?

A sample input file: (Real input file contains thousands of lines and hundreds of columns. Number of columns is not fixed, it changes frequently.)

I need to print values in column A:

for cell in mydict["A"]:
    print cell

and to print values in the same row:

for i in range(1, numrows):
    for key in keysOfMydict:
        print mydict[key][i]

Why don't you just store the rows and use a dictionary to map column names to their index? — GWW
– GWW, Commented Aug 26, 2014 at 4:58
If the number of columns is not fixed, what would you expect to print in a row where the column is missing ? — Nir Alfasi
– Nir Alfasi, Commented Aug 26, 2014 at 5:05
Depending on what else you're doing with your data you might find interesting the pandas library: pandas.pydata.org/pandas-docs/stable/10min.html#getting — mechanical_meat
– mechanical_meat, Commented Aug 26, 2014 at 5:06
@GWW, the main computation is on columns. It may be inefficient to retrieve a row, since one cell within this row will be used, other cells will not be used. — Kadir
– Kadir, Commented Aug 26, 2014 at 5:19
@alfasin, the number of cells in each row is same. I meant that I do not want solutions which contain hard-coded column count and column names, because these codes are not manageable when the number of columns frequently changes. — Kadir
– Kadir, Commented Aug 26, 2014 at 5:23

Burhan Khalid · Accepted Answer · 2014-08-26 06:01:18Z

1

The simplest way is to use DictReader from the csv module:

with open('somefile.txt', 'r') as f:
   reader = csv.DictReader(f, delimiter='\t')
   rows = list(reader) # If your file is not large, you can
                       # consume it entirely

   # If your file is large, you might want to 
   # step over each row:
   #for row in reader:
   #    print(row['A'])

for row in rows:
   print(row['A'])

@Marius made a good point - that you might be looking to collect all columns separately by their header.

If that's the case, you'll have to adjust your reading logic a bit:

from collections import defaultdict
by_column = defaultdict(list)

for row in rows:
   for k,v in row.iteritems():
       by_column[k].append(v)

Another option is pandas:

>>> import pandas as pd
>>> i = pd.read_csv('foo.csv', sep=' ')
>>> i
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
>>> i['A']
0    1
1    2
2    3
Name: A, dtype: int64

edited Aug 26, 2014 at 6:01

answered Aug 26, 2014 at 5:16

Burhan Khalid

175k20 gold badges254 silver badges291 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marius Over a year ago

I think OP wants a dict that looks like {'A': [all vals in column A], 'B': [all vals in column B]}, not individual dicts for each row like DictReader provides.

thavan · Accepted Answer · 2014-08-26 06:11:35Z

0

Not sure this is relevant, but you can do this using rpy2.

from rpy2 import robjects
dframe = robjects.DataFrame.from_csvfile('/your/csv/file.csv', sep=' ')
d = dict([(k, list(v)) for k, v in dframe.items()])

output:

{'A': [1, 2, 3], 'C': [7, 8, 9], 'B': [4, 5, 6]}

answered Aug 26, 2014 at 6:11

thavan

2,48924 silver badges32 bronze badges

Collectives™ on Stack Overflow

Storing each column in a separate dictionary using python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related