So I have a text table which looks like the following:
BLOCK 1. MARKERS: 1 2
42 (0.500) |0.269 0.166 0.041 0.024|
21 (0.351) |0.069 0.119 0.079 0.084|
22 (0.149) |0.054 0.040 0.055 0.000|
Multiallelic Dprime: 0.295
BLOCK 2. MARKERS: 9 10 11 12
1123 (0.392) |0.351 0.037|
2341 (0.324) |0.277 0.043|
2121 (0.176) |0.016 0.164|
1121 (0.108) |0.073 0.036|
Multiallelic Dprime: 0.591
BLOCK 3. MARKERS: 13 14
13 (0.716)
34 (0.284)
For each block, I only need the following information:
BLOCK1:
42 0.500
21 0.351
22 0.149
I don't have any problem parsing individuals lines. And extracting what I need. Probably a list of a lists, should be my goal. My problem is that I cannot read the exact number of lines for each block, without getting an error at the end.
So I've wrote this ugly code:
file = open('haplotypes_hetero.txt')
to_parse = []
for line in file:
to_parse.append(line.strip())
to_parse_2=[]
for line in to_parse:
line = line.split()
to_parse_2.append(line)
for i in range(len(to_parse_2)):
if to_parse_2[i][0]=='BLOCK':
z=i
if z < len(to_parse_2):
z+=1
while to_parse_2[z][0] != 'BLOCK':
print to_parse_2[z][0]
z+=1
if z>len(to_parse_2):
z=0
file.close()
It kinda works, and prints what it supposed to. However I am getting an error at the end.
42
21
22
Multiallelic
1123
2341
2121
1121
Multiallelic
13
34
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
How do I get rid of the index error?