0
    A   C   G   T
A   2   -1  -1  -1  
C   -1  2   -1  -1
G   -1  -1  2   -1
T   -1  -1  -1  2

This file is separated by tabs as a text file and I want it to be mapped in a similar format to in python.

{'A': {'A': 91, 'C': -114, 'G': -31, 'T': -123},
    'C': {'A': -114, 'C':  100, 'G': -125, 'T': -31},
    'G': {'A': -31, 'C': -125, 'G': 100, 'T':  -114},
    'T': {'A': -123, 'C': -31, 'G':  -114, 'T':  91}}

I have tried very had but I cannot figure out how to do this as I am new to python.

Please help.

My code so far:

seq = flines[0]
    newseq = []
    j = 0
    while(l < 4):
        i = 2
        while(o < 4):
            newseq[i][j] = seqLine[i]
            i = i + 1;
            o = o + 1
        j = j + 1
        l = l + 1
    print (seq)
    print(seqLine)
5
  • 2
    I'm not entirely sure how your file maps to the values you've given. Can you provide the code you've tried (even though it failed) as a starting point? Along with, of course, error messages and/or the output it did give you Commented Feb 19, 2014 at 22:33
  • 1
    If I understand you correctly the dict['A']['A'] value for your data should be 2 (where it says 91 in the good result)? Commented Feb 19, 2014 at 22:39
  • @mhlester I want to be able to access the values as I would be in the second array through ['A']['C'] = -114 Commented Feb 19, 2014 at 22:40
  • yes. The good result is just a sample of how the data should look. Commented Feb 19, 2014 at 22:40
  • @deinonychusaur The first data is as it is stored in a text file. I am trying to parse it in the format of the shown in the second code block. Commented Feb 19, 2014 at 22:41

2 Answers 2

1

I think this is what you want:

import csv

data = {}

with open('myfile.csv', 'rb') as csvfile:
    ntreader = csv.reader(csvfile, delimiter="\t", quotechar='"')
    for rowI, rowData in enumerate(ntreader):
        if rowI == 0:
            headers = rowData[1:]
        else:
            data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])}


print data

To make life easy I use csv-module and just say tab is delimiter, then I grab the column headers on the first row and use them for all other rows to label the values.

This produces:

{'A ': {'A': '2', 'C': '-1', 'T': '-1  ', 'G': '-1'}, 
 'C': {'A': '-1', 'C': '2', 'T': '-1', 'G': '-1'},
 'T': {'A': '-1', 'C': '-1', 'T': '2', 'G': '-1'},
 'G': {'A': '-1', 'C': '-1', 'T': '-1', 'G': '2'}}

Edit*

For python <2.7 it should work if you switch the dictionary comprehension line (rowData[0]] = ....) above and use a simple loop in the same place:

    rowDict = dict()
    for k, v in zip(headers, rowData[1:]):
        rowDict[k] = int(v)
    data[rowData[0]] = rowDict
Sign up to request clarification or add additional context in comments.

6 Comments

data[rowData[0]] = {k: int(v) for k, v in zip(headers, rowData[1:])} ^ SyntaxError: invalid syntax
what version of python do you have because it works just fine in 2.7.5
Python 2.6.6 centos 6.5
ah don't know if dict comprehension was implemented back then... should have put that in your question... :) A moment...
is there any alternative to this that is backward compatible?
|
1

Using csv.DictReader gets you most of the way there on your own:

reader = DictReader('file.csv', delimiter='\t')
#dictdata = {row['']: row for row in reader}       # <-- python 2.7+ only
dictdata = dict((row[''], row) for row in reader)  # <-- python 2.6 safe

Outputs:

{'A': {None: [''], '': 'A', 'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
 'C': {'': 'C', 'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
 'G': {'': 'G', 'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
 'T': {'': 'T', 'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}

To clean up the extraneous keys got messy, and I needed to rebuild the inner dict, but replace the last line with this:

dictdata = {row['']: {key: value for key, value in row.iteritems() if key} for row in reader}

Outputs:

{'A': {'A': '2', 'C': '-1', 'G': '-1', 'T': '-1'},
 'C': {'A': '-1', 'C': '2', 'G': '-1', 'T': '-1'},
 'G': {'A': '-1', 'C': '-1', 'G': '2', 'T': '-1'},
 'T': {'A': '-1', 'C': '-1', 'G': '-1', 'T': '2'}}

Edit: for Python <2.7

Dictionary comprehensions were added in 2.7. For 2.6 and lower, use the dict constructor:

dictdata = dict((row[''], dict((key, value) for key, value in row.iteritems() if key)) for row in reader)

8 Comments

dictdata = {row['']: row for row in reader} ^ SyntaxError: invalid syntax
could you please also put in the code for that. I cannot tell how grateful I am to you for this. It has been doing my head in for hours.
now it is giving me syntax error for dictdata = {row['']: row for row in reader} line
Sorry, I only updated the one that filters extraneous keys. will update the earlier one as well
dictdata = dict((row[''], row) for row in reader) # <-- python 2.6 safe KeyError: ''
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.