I am a beginner in Python (I am a biologist) and I have a file with the results from a particular software and i would like to parse the result using python. From the following output I would like to get just the score and would like to split the sequence into individual amino acids.
no. score Sequence
1 0.273778 FFHH-YYFLHRRRKKCCNNN-CCCK---HQQ---HHKKHV-FGGGE-EDDEDEEEEEEEE-EE--
2 0.394647 IIVVIVVVVIVVVVVVVVVV-CCCVA-IVVI--LIIIIIIIIYYYA-AVVVVVVVAAAAV-AST-
3 0.456667 FIVVIVVVVIXXXXIGGGGT-CCCCAV -------------IVBBB-AAAAAA--------AAAA-
4 0.407581 MMLMILLLLMVVAIILLIII-LLLIVLLAVVVVVAAAVAAVAIIII-ILIIIIIILVIMKKMLA-
5 0.331761 AANSRQSNAAQRRQCSNNNR-RALERGGMFFRRKQNNQKQKKHHHY-FYFYYSNNWWFFFFFFR-
6 0.452381 EEEEDEEEEEEEEEEEEEEE-EEEEESSTSTTTAEEEEEEEEEEEE-EEEEEEEEEEEEEEEEE-
7 0.460385 LLLLLLLLMMIIILLLIIII-IIILLVILMMEEFLLLLILIVLLLM-LLLLLLLLLLVILLLVL-
8 0.438680 ILILLVVVVILVVVLQLLMM-QKQLIVVLLVIIMLLLLMLLSIIIS-SMMMILFFLLILIIVVL-
9 0.393291 QQQDEEEQAAEEEDEKGSSD-QQEQDDQDEEAAAHQLESSATVVQR-QQQQQVVYTHSTVTTTE-
From the above table,I would like to get a table with the same number,score but the sequences separated individually (columnwise) so it should look like
no. score amino acid(1st column)
1 0.273778 F
2 0.395657 I
3 0.456667 F
another table representing the second column of amino acids
no score amino acid (2nd column)
1 0.273778 F
2 0.395657 I
3 0.456667 I
third table representing the third column of amino acids and fourth table for 4th column of amino acids and so on
Thanks in advance for the help
F,Iandfstand for? Are these the first characters from the strings above? Why thefin the third line and notF? We are not beginners in Python, but we are no biologists either. We can help you with Python but you have to explain what are the individual amino acids here.