0

I have a code which generates a random sequence:

import random
selection60 = {"A":20, "T":20, "G":30, "C":30}
sseq60=[]
for k in selection60:
    sseq60 = sseq60 + [k] * int(selection60[k])
    random.shuffle(sseq60)
sequence="".join(random.sample(sseq60, 100))

The output in this case is:

GACCCCTCTGTACTATTAAAAGGCGTCACCGCGCCGAAAGAGCTGCAAGGCAATAGTGGACCAGAATCAAACGAAGGATTGCTTAGGTAATGGAATACAA

However, I would like to implement something that checks as well that no repeats of longer then 10 bases will be created for example:

GACCCCCCCCCCCTATTAAAAGGCGTCATCGCGCCGAAAGAGTTGCAAGGCAATAGTGGAGCAGAATTAAACGAAGGATTGCTTAGGTAATGGAATAAAA

This sequence contains 11 Cs at the beginning and it should not be allowed, the distribution of the letters should be uniform, is the random.sample function doing it by itself or does this need to be implemented?

3 Answers 3

1

The easiest to code is to check your sample and toss it if there are too many repeats:

from collections import Counter
from random import sample

pool = Counter({"A":20, "T":20, "G":30, "C":30})
too_many = [''.join([k]*11) for k in pool]
fn_select = lambda p: ''.join(sample(list(p.elements()), sum(p.values())))
selection = fn_select(pool)
while any(t in selection for t in too_many):
    selection = fn_select(pool)
print(selection)

Some detail:

too_many is set up as a list of 'illegal' sequences, i.e. ['AAAAAAAAAAA', 'TTTTTTTTTTT', 'GGGGGGGGGGG', 'CCCCCCCCCCC'].

any(t in selection for t in too_many) will be True if any of those 4 sequences are present in the selection, in which case we want to start fresh with a new sample.

Depending on your preference, you could rewrite the same code using a while True: loop:

from collections import Counter
from random import sample
pool = Counter({"A":20, "T":20, "G":30, "C":30})
too_many = [''.join([k]*11) for k in pool]
while True:
    selection = ''.join(sample(list(pool.elements()), sum(pool.values())))
    if not any(t in selection for t in too_many):
        break
print(selection)
Sign up to request clarification or add additional context in comments.

3 Comments

hi, so, can you explain please, what the any in the while loop does? it excludes the elements found in too_many?
can you insert it in a nested while loop?
@PaoloLorenzini see edited response, hopefully answered your questions
1

Truly random sampling is sometimes going to generate long series of repeats. However, in this case, you're doing it wrong. Do the random shuffle after you generate the whole list. Do the shuffle a couple of times, if you want.

import random
selection60 = {"A":20, "T":20, "G":30, "C":30}
sseq60=[]
for k in selection60:
    sseq60 = sseq60 + [k] * int(selection60[k])
random.shuffle(sseq60)
sequence="".join(random.sample(sseq60, 100))

4 Comments

I got it, so is the shuffle sufficient to avoid repeats?
by the way, is there any way to check while a sequence is generated that it is less than 10% similar to the sequences already generated?
I would like to remove a sequence which has more than 10 percent similarity to any of the sequences that are created
Then that's not random. When you interfere with uniform randomness, then it's no longer uniform.
1

How about something like :

import re

concatenated = ["A"]*20 + ["T"]*20 + ["G"]*30 + ["C"]*30
while True:
    random.shuffle(concatenated)
    sequence = "".join(concatenated)
    # exit the loop since we have found a sequence not containing more than 10 repeats of any letter
    if not re.search("A{11,}|T{11,}|G{11,}|C{11,}", sequence):
        break

This will run until you find a sequence not containing more than 10 repeats in a row of any letter.

2 Comments

but in this case, I get an out put with sequences with different lengths, I want each sequence to be the same length, in this example is set to 100
@PaoloLorenzini this should always give 100 elements back. This is doing the same as Jamie Deith's answer, just much more succinctly

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.