0

I have a very big text file like the small example:

small example:

>g1
GAATTCCTTGAGGCCTAAATGCATCGGGGTGCTCTGGTTTTGTTGTTGTTATTTCTGAATGACATTTACTTTGGTGCTCTTTATTTTGCGTATTTAAAAC
>g2
TAAGTCCCTAAGCATATATATAATCATGAGTAGTTGTGGGGAAAATAACACCATTAAATGTACCAAAACAAAAGACCGATCACAAACACTGCCGATGTTTCTCTGGCTTAAATTAAATGTATATACAACTTATATGATAAAATACTGGGC

in the text file there are many parts and every part has 2 lines. the 1st line starts with > and it is called ID and the 2nd line is a sequence of characters. I want to make a dictionary from the text file in python. the key in the dictionary will be the 1st line in the file without > and the values in the resulting dictionary, is a list of tuples. but what is the numbers in the the tuples? for the tuples I divide the length of each sequence (2nd line of each part) by a fixed number and make a range of numbers. for example in this example I divided by 10. in the expected output, you see that the key is equal to the ID and every tuple belong to each list in the value of each dictionary has 2 numbers, the difference between 2 numbers is 10. the 1st tuple starts with 1 and ends with 10, the 2nd tuple starts with 10 and ends with 20 and this is the case until the end (so, the number of tuples is dependent on the length of sequence in the 2nd line of each part in the text file). here is the expected output:

expected output:

{ g1: [(1, 10), (10, 20), (20, 30), (30, 40), (40, 50), (50, 60), (60, 70), (70, 80), (80, 90), (90, 100)], g2: [(1, 10), (10, 20), (20, 30), (30, 40), (40, 50), (50, 60), (60, 70), (70, 80), (80, 90), (90, 100), (100, 110), (110, 120), (120, 130), (130, 140), (140, 150)]}

I am trying to do that in python and have tried the following code but did not get what I expect. do you know how to fix the problem?

from itertools import groupby
with open('infile.txt') as f:
    groups = groupby(f, key=lambda x: not x.startswith(">"))
    d = {}
    for k,v in groups:
        if not k:
            key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
            d[key] = val


k = d.keys()
v = d.values()
val = [tuple(len(v)/10)]
3
  • { '..' : [ (a,a+9) for a in range(1,len(g2),10) ] } ? Commented Oct 22, 2018 at 15:16
  • Hello friend, and welcome to Stack Overflow! I know you're new, so don't feel bad - it's a common mistake - but what you have here is a "Wall of Text". Why is it called that? Walls are difficult to penetrate. Likewise your big block of text will make it difficult for anyone to help you because it looks like you put little effort into your question - even if you have. Try a bullet-formatted list with as little information as possible to explain your problem. Good Luck! Commented Oct 22, 2018 at 15:18
  • If you like my answer please mark it as accepted. Commented Oct 22, 2018 at 17:35

1 Answer 1

1

This is not the prettiest solution, but it works great!

keyList = list()
valList = list()
with open('infile.txt') as f:
    for idx, line in enumerate(f.readlines()):
        if idx % 2 == 0:
            keyList.append(line[1:-1])
        else:
            valLine = line
            valTup = list()
            tempVal = ''
            for count, char in enumerate(valLine):
                if count % 10 == 0 and count > 0:
                    valTup.append(tempVal)
                    tempVal = char
                else:
                    tempVal += char

            valList.append(tuple(valTup))

myDict = dict()

for key, value in zip(keyList, valList):
    myDict[key] = value

Output:

{
    'g2': ('TAAGTCCCTA', 'AGCATATATA', 'TAATCATGAG', 'TAGTTGTGGG', 'GAAAATAACA', 'CCATTAAATG', 'TACCAAAACA', 'AAAGACCGAT', 'CACAAACACT', 'GCCGATGTTT', 'CTCTGGCTTA', 'AATTAAATGT', 'ATATACAACT', 'TATATGATAA'), 
    'g1': ('GAATTCCTTG', 'AGGCCTAAAT', 'GCATCGGGGT', 'GCTCTGGTTT', 'TGTTGTTGTT', 'ATTTCTGAAT', 'GACATTTACT', 'TTGGTGCTCT', 'TTATTTTGCG', 'TATTTAAAAC')
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.