I have 2 files called big and small like these examples:
big:
chr1 transcript 2481359 2483515 - RP3-395M20.8
chr1 transcript 2487078 2492123 + TNFRSF14
chr1 transcript 2497849 2501297 + RP3-395M20.7
chr1 transcript 2512999 2515942 + RP3-395M20.9
chr1 transcript 2517930 2521041 + FAM213B
chr1 transcript 2522078 2524087 - MMEL1
small:
chr1 2487088 2492113 17
chr1 100757323 100757324 19
chr1 2487099 2492023 21
chr1 100758316 100758317 41
chr1 2514000 2515742 14
I trying to make a new file with 5 columns from big file upon the
following conditions:
conditions :
1- if: the 1st column of small file == 1st column of big file
2- if: the 4th column of big file >= the 2nd column of small file >= the 3rd column of big file
3- if: the 4th column of big file >= the 3rd column of small file >= the 3rd column of big file
columns in output file:
1) 1st column of big file
2) 2nd column of big file
3) 3rd column of big file
4) the number of lines in small files that have the mentioned conditions (we should count)
5) 6th column of big file
here is the expected output for the above example:
chr1 2487078 2492123 2 TNFRSF14
chr1 2512999 2515942 1 RP3-395M20.9
I wrote the following code in python. it does not return the file that
I want. every line in my code seems to be logical. would you help my
to fix it?
def correspond(big, small, outfile):
count = 0
big = open(big, "r")
small = open(small, "r")
big_list = []
small_list = []
for m in big:
big_list.append(m)
for n in small:
small_list.append(n)
final = []
for i in range(0, len(small_list)):
for j in range(0, len(big_list)):
small_row = small_list[i]
big_row = big_list[j]
small_columns = small_row.split()
big_columns = big_row.split()
small_symbol = small_columns[0]
big_symbol = big_columns[0]
name = big_columns[5]
if small_symbol == big_symbol:
small_second_col = small_columns[1]
small_third_col = small_columns[2]
min_range = big_columns[2]
max_range = big_columns[3]
if (small_second_col <= max_range and small_second_col >= min_range and small_third_col <= max_range and small_third_col >= min_range):
count+=1
new_line = small_row.rstrip("\n") + " " + big_symbol + " " + min_range + " " + max_range + str(count) + name
final.append(new_line)
with open(outfile, "w") as f:
for item in final:
f.write("%s\n" % item)