Splitting a string in sub-strings python

Question

There are any efficient way to split a sequence like this not using [:] slicing?

GATAAG  G  ATAAG
        GA  TAAG
        GAT  AAG
        GATA  AG
        GATAA  G

I found something in itertools, but not do it right:

def subslices(seq):
    "Return all contiguous non-empty subslices of a sequence"
    # subslices('ABCD') --> A AB ABC ABCD B BC BCD C CD D
    slices = itertools.starmap(slice, itertools.combinations(range(len(seq) + 1), 2))
    return map(operator.getitem, itertools.repeat(seq), slices)

list(subslices(s))
['G', 'GA', 'GAT', 'GATA', 'GATAA', 'GATAAG', 'A', 'AT', 'ATA', 'ATAA', 'ATAAG', 'T', 'TA', 'TAA', 'TAAG', 'A', 'AA', 'AAG', 'A', 'AG', 'G']

And also Not readable. Other solution:

def splitting_kmer(s):
    n = len(s)
    print(n)
    for i, _ in enumerate(s, 1):
        if i == n:
            break
        print(s[:n-i], s[n-i:])

Paulo

Just curious if there are something different to learn. Thanks — Paulo Sergio Schlogl
– Paulo Sergio Schlogl, Commented Sep 8, 2022 at 22:36
There's always something different to learn, but doing so is pointless unless there is some use to it. Given how simple and elegant slicing is, that can hardly be it. And slicing is also fairly efficient, so what type of string splitting, or what application of it are you looking for? In what way could it be better - or in what way do you need it to be? (note that both 'solutions' you included still use slicing with slice and :) — Grismar
– Grismar, Commented Sep 8, 2022 at 22:54
"I need a efficient way to split the words" - that's easy with slicing, and I seriously doubt the slicing of the word is anywhere near a performance bottleneck for a task like that. That's like optimising the walking route to your car before taking a cross-country roadtrip to save time. — Grismar
– Grismar, Commented Sep 9, 2022 at 1:06
seconding - I suspect it's most efficient to slice here unless you're using a scientific Python library like NumPy because it will get a view of the string rather than creating a new string, and further to create a generator (you may find you can even delegate like yield from map instead of return _) if your caller is just going to iterate over the results — ti7
– ti7, Commented Sep 9, 2022 at 1:21

Grismar · Accepted Answer · 2022-09-09 01:10:22Z

1

A simple and efficient way to get all unique substrings of a string:

sample = 'GATAAG'

slices = set(sample[i:j] for i in range(len(sample)) for j in range(i+1, len(sample)))

print(slices)

Result:

{'AA', 'AT', 'GATA', 'A', 'GATAA', 'G', 'GA', 'TA', 'T', 'ATA', 'TAA', 'ATAA', 'GAT'}

They are in random order because it's a set (which is unordered by definition), and they're in a set to ensure there are no duplicates. If you want duplicates and order:

sample = 'GATAAG'

slices = [sample[i:j] for i in range(len(sample)) for j in range(i+1, len(sample))]

print(slices)

Result:

['G', 'GA', 'GAT', 'GATA', 'GATAA', 'A', 'AT', 'ATA', 'ATAA', 'T', 'TA', 'TAA', 'A', 'AA', 'A']

answered Sep 9, 2022 at 1:10

Grismar

32.4k6 gold badges42 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Splitting a string in sub-strings python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related