0

I have the following text file- http://www.ncbi.nlm.nih.gov/Class/FieldGuide/BLOSUM62.txt

I need a python code to give me the specific entries of the matrix. I'm using multidimensional lists and would prefer doing it without the numpy library in python. My intent is to form lists within lists where the outer(main) list contains rows of the matrix and the inner list contains the cells of the matrix.

I'm using the following code-

handle=open(fname)
li=[]
matrix=[]
for line in handle:
      if not line.startswith('#'):
             a=line.split()
             for i in a:
                  li.append(i)
                  matrix.append(li)
print matrix

However, this just returns a one dimensional list with each element being one cell of the matrix. I'm lost regarding how to fix this. The output should be something of this form-

[['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'B', 'Z', 'X', '*'],
['A', '4', '-1', '-2', '-2', '0', '-1', '-1', '0', '-2', '-1', '-1', '-1', '-1', '-2', '-1', '1', '0', '-3', '-2', '0', '-2', '-1', '0', '-4']]

2 Answers 2

1

I think you want to produce a matrix, for example matrix[0][1] refer to a value, right? see following code.

handle=open(fname)
matrix=[]
col={}
idx=0
row={}
idr=0
# get 1st line as column
first_line=0
for line in handle:
    if not line.startswith('#'):
        if first_line == 0:
            first_line=1
            # get column header
            for i in line.split():
                 col[i]=idx
                 idx=idx+1
        else: 
            a = line.split()
            x = a.pop(0)
            # get row name
            row[x]=idr
            matrix.append(a)
            idr=idr+1
print matrix
print matrix[col['A']][row['A']]

See if this is what you want.

Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I wanted. Thanks a lot @Ronald
0

You aren't getting the results you want because you're putting all the values into the same li list. The simplest fix for the issue is simply to move the place you create li into the loop:

handle=open(fname)
matrix=[]
for line in handle:
      if not line.startswith('#'):
             li=[]                   # move this line down!
             a=line.split()
             for i in a:
                  li.append(i)
                  matrix.append(li)
print matrix

The inner loop there is a bit silly though. You're adding all the values from one list (a) to another list (li), then throwing away the first list. You should just use the list returned by str.split directly:

handle=open(fname)
matrix=[]
for line in handle:
      if not line.startswith('#'):
             matrix.append(line.split())
print matrix

4 Comments

This is just giving me a ton of garbage output. My intent of using the inner loop is that the first list should contain cell-wise elements and the second list should be a list of these cell-wise elements for each row.
I guess I don't understand what you want. Can you edit the question to show the desired output?
Does this make the question any clearer? @Blckknght
The second version of my code had a typo (missing () at the end of line.split). If you fix that, does it give you the output you expect?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.