0

The sample file looks like this:

 ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
  '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
  '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
  '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
  '$$$\n', '\n',
  '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
  '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
  '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
  '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
  '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
  '>B42\n', 'TT-GTGGGTATC\n']

The $$$ separates the two sets. I need to use .strip function and remove the \n and all the "headers".

I need to make a list (as below) and replace "-" with Z

  [ 'TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
    'TCCGTGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
    'ATCGGGGGTATT','TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC',
    'TTCGTGGGTATT','TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC',
    'TTCGGGGGTATC','TTCGGGGGTATC','TT-GTGGGTATC']

Here is the link to a code (https://stackoverflow.com/a/39965048/6820344), where a similar question was answered. I tried to modify the code to get the output mentioned above. However, I am unable to have the list without the "$$$". Also, I need a list, not a list of lists.

seq_list = []
for x in lst:
    if x.startswith('>'):
        seq_list.append([])
        continue
    x = x.strip()
    if x:
        seq_list[-1].append(x.replace("-", "Z"))
print(seq_list)
7
  • You say you need a list of lists "(as below)" but how is the below example a list of lists? Commented Oct 10, 2016 at 22:43
  • Your expected output is a single list,not a list of lists. Does $$$ separate the lists? Commented Oct 10, 2016 at 22:43
  • Yes, $$$ separates the list. But I just want a single list with all the elements. Commented Oct 10, 2016 at 22:44
  • @dkasak : Thanks for pointing out the error. I have corrected the same Commented Oct 10, 2016 at 22:45
  • Iterate over the strings in the original list, if a string starts with $ or starts with > or is empty after being stripped then continue without doing anything, otherwise append the stripped string to your final list. Commented Oct 10, 2016 at 22:49

1 Answer 1

1
input = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
        '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
        '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
        '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n', '\n',
        '$$$\n', '\n',
        '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
        '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
        '>B5\n', 'TTCGTGGGTATT\n', '>B6\n', 'TTCGGGGGTATC\n',
        '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
        '>B9\n', 'TTCGGGGGTATC\n', '>B10\n', 'TTCGGGGGTATC\n',
        '>B42\n', 'TT-GTGGGTATC\n']

output = []

for elem in input:
    if elem.startswith('>') or \
       elem.startswith('$') or \
       elem.isspace():
         continue

    output.append(elem.replace('-', 'Z').strip())

from pprint import pprint
pprint(output, compact=True)

When the preceding code is run, the following output is the result:

['TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC',
 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'ATCGGGGGTATT', 'TTZGTGGGAATC',
 'TTCGTGGGAATC', 'TTZGTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC', 'TTZGTGGGTATC',
 'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC', 'TTZGTGGGTATC']
Sign up to request clarification or add additional context in comments.

2 Comments

More succinctly, something like works too: filter(None, [x.strip().replace('-', 'Z') for x in input if not x[0] in '$>']).
Indeed, that's a very nice succinct version. I've thought about using something similar but decided against it in the name of legibility and less magic.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.