Count the number of x occurrences of length n (it's not given)of sub-string in string [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? As written, this question is lacking some of the information it needs to be answered. If the author adds details in comments, consider editing them into the question. Once there's sufficient detail to answer, vote to reopen the question.

Closed 9 years ago.

Improve this question

I'm not able to get the number of occurrences of a substring that has n-lenght in a string. For example if the string is

CCCATGGTtaGGTaTGCCCGAGGT

and n is

The output must be something like :

'CCC' : 2, 'GGT' :3

The input is a list of lists so I get evry string of list but Im not able to go ahead and the output is the dic of all strings

Code:

def get_all_n_repeats(n,sq_list):
    reps={}
    for i in sq_list:
        if not i:
            continue
        else:   
            for j in i:
                ........#Here the code I want to do#......                  
return reps

Your output and your input don't make sense. If you split your input string into three letter strings, you get ['CCC', 'ATG', 'GTt', 'aGG', 'TaT', 'GCC', 'CGA', 'GGT'] so I don't know where you got GGT in your output. — Burhan Khalid
– Burhan Khalid, Commented May 29, 2016 at 19:57
What is so unclear about this question? It makes perfect sense. — Jivan
– Jivan, Commented May 29, 2016 at 20:03
@BurhanKhalid I think his candidates are ['CCC', 'CCA', 'CAT', 'ATG', 'TGG', 'GGT', 'GTt', 'Tta', 'taG', 'aGG', 'GGT', 'GTa', 'TaT', 'aTG', 'TGC', 'GCC', 'CCC', 'CCG', 'CGA', 'GAG', 'AGG', 'GGT']. — totoro
– totoro, Commented May 29, 2016 at 20:04

DevLounge · Accepted Answer · 2016-05-29 20:38:28Z

2

A really simple solution:

from collections import Counter

st = "CCCATGGTtaGGTaTGCCCGAGGT"
n = 3

tokens = Counter(st[i:i+n] for i in range(len(st) - n + 1))
print tokens.most_common(2)

After it is up to you to make it a helper function.

edited May 29, 2016 at 20:38

answered May 29, 2016 at 20:18

DevLounge

8,4633 gold badges33 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

totoro · Accepted Answer · 2016-05-29 21:09:19Z

1

A very explicit solution:

s = 'CCCATGGTtaGGTaTGCCCGAGGT'
n = 3
# All possible n-length strings
l = [s[i:i + n] for i in range(len(s) - (n - 1))]
# Count their distribution
d = {}
for e in l:
    d[e] = d.get(e, 0) + 1
print(d)

edited May 29, 2016 at 21:09

answered May 29, 2016 at 20:13

totoro

2,4662 gold badges19 silver badges24 bronze badges

Comments

Jivan · Accepted Answer · 2016-05-29 20:05:37Z

0

Use Counter

from collections import Counter

def count_occurrences(input, n):
    candidates = []
    for i, c in enumerate(st):
        try:
            candidates.append('{}{}{}'.format(st[i], st[i+1], st[i+2]))
        except IndexError:
            continue

    output = {}
    for k,v in Counter(candidates).items():
        if v > 1:
            output[k] = v

st = "CCCATGGTtaGGTaTGCCCGAGGT"
n = 3

count_occurrences(st, n)
# {'GGT': 3, 'CCC': 2}

edited May 29, 2016 at 20:05

answered May 29, 2016 at 19:59

Jivan

23.4k16 gold badges92 silver badges144 bronze badges

1 Comment

Burhan Khalid Over a year ago

Counter(candidates).most_common()

Collectives™ on Stack Overflow

Count the number of x occurrences of length n (it's not given)of sub-string in string [closed]

3 Answers 3

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Linked

Related