0

I am trying to write a function that takes a string of DNA and returns the compliment. I have been trying to solve this for a while now and looked through the Python documentation but couldn't work it out. I have written the docstring for the function so you can see what the answer should look like. I have seen a similar question asked on this forum but I could not understand the answers. I would be grateful if someone can explain this using only str formatting and loops / if statements, as I have not yet studied dictionaries/lists in detail.

I tried str.replace but could not get it to work for multiple elements, tried nested if statements and this didn't work either. I then tried writing 4 separate for loops, but to no avail.

def get_complementary_sequence(dna):

    """ (str) -> str

    Return the DNA sequence that is complementary 
    to the given DNA sequence.

    >>> get_complementary_sequence('AT')
    TA
    >>> get_complementary_sequence('GCTTAA')
    CGAATT

    """

    for char in dna:
        if char == A:
            dna = dna.replace('A', 'T')
        elif  char == T:
            dna = dna.replace('T', 'A')
        # ...and so on
2
  • what does this compliment do? Commented Dec 26, 2014 at 16:51
  • its supposed to find the compliment strand on a dna sequence. there are 4 nucleotides on a dna strand. so A on one strand compliments to T on the other strand. T with A, C with G and G with C Commented Dec 26, 2014 at 16:53

2 Answers 2

5

For a problem like this, you can use string.maketrans (str.maketrans in Python 3) combined with str.translate:

import string
table = string.maketrans('CGAT', 'GCTA')
print 'GCTTAA'.translate(table)
# outputs CGAATT
Sign up to request clarification or add additional context in comments.

2 Comments

will this work on any dna strand, say something much longer with a random sequence?
@DataScienceAcademy: Yes. This just translates each character according to the translation table. See the documentation for more details.
1

You can map each letter to another letter.

You probably need not create translation table with all possible combination.

>>> M = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
>>> STR = 'CGAATT'
>>> S = "".join([M.get(c,c) for c in STR])
>>> S
'GCTTAA'

How this works:

# this returns a list of char according to your dict M
>>> L = [M.get(c,c) for c in STR]  
>>> L
['G', 'C', 'T', 'T', 'A', 'A']

The method join() returns a string in which the string elements of sequence have been joined by str separator.

>>> str = "-"
>>> L = ['a','b','c']
>>> str.join(L)
'a-b-c'

3 Comments

thank you this works, just for my learning though why do you have "".join in line 3?
Using str.translate() is simpler and would be much faster at doing the replacements -- the creation of translation tables is trivial with string.maketrans() (or str.maketrans() in Python 3).
Since there are 4 nucleotides, one would need to pass two 4 letter strings to string.maketrans() to create a translation table that could be used to complement any sequence -- see @nneonneo's answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.