1

I'm trying to read a file that looks like:

Protein in water
5826
300LEU      N 2945   7.972  16.153  13.055 -0.0183  0.4861 -0.4376
300LEU      H 2946   8.006  16.194  13.139  1.5894  1.3176 -1.4422
300LEU     CA 2947   8.017  16.020  13.016  0.1247  0.7136 -0.1096
300LEU     CB 2948   8.157  15.990  13.077 -0.0499  0.0576  0.0414
300LEU     CG 2949   8.273  16.081  13.032 -0.3927 -0.5342  0.1311
300LEU    CD1 2950   8.271  16.143  12.895  0.2232  0.1271  0.2677
300LEU    CD2 2951   8.281  16.197  13.136  0.0409 -0.0097  0.0710
300LEU      C 2952   7.917  15.908  13.047  0.5031  0.0949  0.0620
300LEU      O 2953   7.955  15.799  13.093 -0.2261 -0.5800  0.0226

I have to strip the first 2 lines and read the different columns separately. I have tried this:

 with open('file.txt') as fa:
     for line_aa in fa.readlines()[3:11]:
         line_aa = line_aa.strip()
         print line_aa
         col1,col2,col3,col4,col5,col6,col7,col8,col9 = line_aa.split('\t',9)

but I get the following error:

300LEU      H 2946   8.110  15.548  13.027 -0.0632  0.8718 -0.8443
Traceback (most recent call last):
File "rmsd_cg_vs_aa.py", line 50, in <module>
col1,col2,col3,col4,col5,col6,col7,col8,col9 = line_aa.split('\t',9)
ValueError: need more than 1 value to unpack

What am I missing here?

4

3 Answers 3

4

You're splitting on tabs, try splitting on whitespace instead by just using:

str.split()

then you should get what you want.

Sign up to request clarification or add additional context in comments.

5 Comments

+1. You beat in me in a second. However, for these kind of analysis pandas package is better. Because the 'protein in water' may need grouping, sum, average etc per column.
Thank you @VinayakKolagi i did not know about that package, I will have to check it out.
thanks!!!! However if I try it says: Traceback (most recent call last): File "rmsd_cg_vs_aa.py", line 50, in <module> col1,col2,col3,col4,col5,col6,col7,col8,col9 = line_aa.split() ValueError: need more than 8 values to unpack
Are you sure that the last line in the file is not a blank line?
@user1338219 you can try and print line_aa.split() to see exactly what are the items that return from split.
0

I think in this line '300LEU H 2946 8.110 15.548 13.027 -0.0632 0.8718 -0.8443'. Python is considering the white spaces as normal space instead of tab(\t). Please try to print ascii (ord()) of the white space and make sure it is '\t'. If not split the string with the proper charactor. May be you can split with space and strip it.

Comments

0

for some reason splitting by \t only returns one value so the error is thrown when trying to apply that one value to columns 1 to 9.

try this:

print(len(line_aa.split('\t',9))

it prints 1 right?

I would suggest you just split by whitespace rather than tabs:

col1,col2,col3,col4,col5,col6,col7,col8,col9 = line_aa.split(maxsplit=9)

1 Comment

thanks. however it does say: TypeError: split() takes no keyword arguments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.