Parse a text file in an array python

Question

    A   C   G   T
A   2   -1  -1  -1  
C   -1  2   -1  -1
G   -1  -1  2   -1
T   -1  -1  -1  2

This file is separated by tabs as a text file and I want it to be mapped in a similar format to in python.

{'A': {'A': 91, 'C': -114, 'G': -31, 'T': -123},
    'C': {'A': -114, 'C':  100, 'G': -125, 'T': -31},
    'G': {'A': -31, 'C': -125, 'G': 100, 'T':  -114},
    'T': {'A': -123, 'C': -31, 'G':  -114, 'T':  91}}

I have tried very had but I cannot figure out how to do this as I am new to python.

Please help.

My code so far:

seq = flines[0]
    newseq = []
    j = 0
    while(l < 4):
        i = 2
        while(o < 4):
            newseq[i][j] = seqLine[i]
            i = i + 1;
            o = o + 1
        j = j + 1
        l = l + 1
    print (seq)
    print(seqLine)

I'm not entirely sure how your file maps to the values you've given. Can you provide the code you've tried (even though it failed) as a starting point? Along with, of course, error messages and/or the output it did give you — mhlester
– mhlester, Commented Feb 19, 2014 at 22:33
If I understand you correctly the dict['A']['A'] value for your data should be 2 (where it says 91 in the good result)? — deinonychusaur
– deinonychusaur, Commented Feb 19, 2014 at 22:39
@mhlester I want to be able to access the values as I would be in the second array through ['A']['C'] = -114 — Vish
– Vish, Commented Feb 19, 2014 at 22:40
yes. The good result is just a sample of how the data should look. — Vish
– Vish, Commented Feb 19, 2014 at 22:40
@deinonychusaur The first data is as it is stored in a text file. I am trying to parse it in the format of the shown in the second code block. — Vish
– Vish, Commented Feb 19, 2014 at 22:41

deinonychusaur · Accepted Answer · 2014-02-19 23:31:51Z

1

I think this is what you want:

import csv

data = {}

with open('myfile.csv', 'rb') as csvfile:
    ntreader = csv.reader(csvfile, delimiter="\t", quotechar='"')
    for rowI, rowData in enumerate(ntreader):
        if rowI == 0:
            headers = rowData[1:]
        else:
            data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])}


print data

To make life easy I use csv-module and just say tab is delimiter, then I grab the column headers on the first row and use them for all other rows to label the values.

This produces:

{'A ': {'A': '2', 'C': '-1', 'T': '-1  ', 'G': '-1'}, 
 'C': {'A': '-1', 'C': '2', 'T': '-1', 'G': '-1'},
 'T': {'A': '-1', 'C': '-1', 'T': '2', 'G': '-1'},
 'G': {'A': '-1', 'C': '-1', 'T': '-1', 'G': '2'}}

Edit*

For python <2.7 it should work if you switch the dictionary comprehension line (rowData[0]] = ....) above and use a simple loop in the same place:

    rowDict = dict()
    for k, v in zip(headers, rowData[1:]):
        rowDict[k] = int(v)
    data[rowData[0]] = rowDict

edited Feb 19, 2014 at 23:31

answered Feb 19, 2014 at 22:59

deinonychusaur

7,3543 gold badges32 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Vish Over a year ago

data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])} ^ SyntaxError: invalid syntax

deinonychusaur Over a year ago

what version of python do you have because it works just fine in 2.7.5

Vish Over a year ago

Python 2.6.6 centos 6.5

deinonychusaur Over a year ago

ah don't know if dict comprehension was implemented back then... should have put that in your question... :) A moment...

Vish Over a year ago

is there any alternative to this that is backward compatible?

|

mhlester · Accepted Answer · 2014-02-19 23:39:01Z

1

Using csv.DictReader gets you most of the way there on your own:

reader = DictReader('file.csv', delimiter='\t')
#dictdata = {row['']: row for row in reader}       # <-- python 2.7+ only
dictdata = dict((row[''], row) for row in reader)  # <-- python 2.6 safe

Outputs:

{'A': {None: [''], '': 'A', 'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
 'C': {'': 'C', 'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
 'G': {'': 'G', 'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
 'T': {'': 'T', 'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}

To clean up the extraneous keys got messy, and I needed to rebuild the inner dict, but replace the last line with this:

dictdata = {row['']: {key: value for key, value in row.iteritems() if key} for row in reader}

Outputs:

{'A': {'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
 'C': {'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
 'G': {'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
 'T': {'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}

Edit: for Python <2.7

Dictionary comprehensions were added in 2.7. For 2.6 and lower, use the dict constructor:

dictdata = dict((row[''], dict((key, value) for key, value in row.iteritems() if key)) for row in reader)

edited Feb 19, 2014 at 23:39

answered Feb 19, 2014 at 23:10

mhlester

23.3k10 gold badges55 silver badges76 bronze badges

8 Comments

Vish Over a year ago

dictdata = {row['']: row for row in reader} ^ SyntaxError: invalid syntax

Vish Over a year ago

could you please also put in the code for that. I cannot tell how grateful I am to you for this. It has been doing my head in for hours.

Vish Over a year ago

now it is giving me syntax error for dictdata = {row['']: row for row in reader} line

mhlester Over a year ago

Sorry, I only updated the one that filters extraneous keys. will update the earlier one as well

Vish Over a year ago

dictdata = dict((row[''], row) for row in reader) # <-- python 2.6 safe KeyError: ''

|

Collectives™ on Stack Overflow

Parse a text file in an array python

2 Answers 2

6 Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related