Faster way to get substring in python?

Question

I am coding Skew Algorithm to construct the Suffix Array and I have a long string (length >= 4000). In Skew Algorithm, I have to construct the Triples Array and Sub-strings Array.

For example : I have a string s = 'abcddd'.

Triples Array is : 'abc', 'bcd', 'cdd', 'ddd'
Sub-strings Array is : 'abcddd', 'bcddd', 'cddd', 'ddd', 'dd', 'd'

This is my solution :

import numpy as np

# example
string = 'abdcb.....' (length >= 4000)
temp = 'abdcb......###' (length >= 4000)

triples_arr = np.array([])
sub_strings = np.array([])

for i in range (0, len(temp) - 3):
    triples_arr = np.append(triples_arr, temp[i:i + 3])
    sub_strings = np.append(sub_strings, string[i:string_len])

With a long string (length >= 4000), it take a minute to complete.

Is there any solution that I can decrease the processing time of that task?

I think when constructing a suffix array, you don't copy all the suffix, instead you use something more lightweight like starting position to represent a suffix — Petar Petrovic
– Petar Petrovic, Commented Jan 30, 2018 at 7:02
In Skew Algorithm, I need the triple suffix to sort and label with a character in the alphabet. The solution of @StephenRauch was solved my problem. — Tín Tr.
– Tín Tr., Commented Jan 30, 2018 at 7:20
I think building the Triples Array is fine. But building the sub_strings array need n^2 space and time which defeats the purpose of building suffix array in linear time. — Petar Petrovic
– Petar Petrovic, Commented Jan 30, 2018 at 9:54

Stephen Rauch · Accepted Answer · 2018-01-30 06:55:15Z

Using comprehensions, you can construct these strings faster than using a for loop:

Code:

triples_arr = [a_string[i:i+3] for i in range(0, len(a_string)-1)]
sub_strings = [a_string[i:] for i in range(len(a_string))]

Test Code:

a_string = 'abcdefghijklmnopqrstuvwxyz'

triples_arr = [a_string[i:i+3] for i in range(0, len(a_string)-2)]
print(triples_arr)

sub_strings = [a_string[i:] for i in range(len(a_string))]
print(sub_strings)

Results:

['abc', 'bcd', 'cde', 'def', 'efg', 'fgh', 'ghi', 'hij', 'ijk', 'jkl',
 'klm', 'lmn', 'mno', 'nop', 'opq', 'pqr', 'qrs', 'rst', 'stu', 'tuv',
 'uvw', 'vwx', 'wxy', 'xyz']
['abcdefghijklmnopqrstuvwxyz', 'bcdefghijklmnopqrstuvwxyz',
 'cdefghijklmnopqrstuvwxyz', 'defghijklmnopqrstuvwxyz',
 'efghijklmnopqrstuvwxyz', 'fghijklmnopqrstuvwxyz',
 'ghijklmnopqrstuvwxyz', 'hijklmnopqrstuvwxyz', 'ijklmnopqrstuvwxyz',
 'jklmnopqrstuvwxyz', 'klmnopqrstuvwxyz', 'lmnopqrstuvwxyz',
 'mnopqrstuvwxyz', 'nopqrstuvwxyz', 'opqrstuvwxyz', 'pqrstuvwxyz',
 'qrstuvwxyz', 'rstuvwxyz', 'stuvwxyz', 'tuvwxyz', 'uvwxyz',
 'vwxyz', 'wxyz', 'xyz', 'yz', 'z']

Ove · Accepted Answer · 2018-01-30 07:08:17Z

0

This may or may not work for you, but if you operate on bytes and memoryview objects instead of string objects, many optimizations become available. For example, it is very cheap to slice memoryviews.

answered Jan 30, 2018 at 7:08

Ove

7884 silver badges9 bronze badges

Comments

Aaditya Ura · Accepted Answer · 2018-01-30 17:37:08Z

0

What about without any external lib and without any loop ?

Triples_Array=[]
Sub_strings=[]

def hello(data):
    if not data:
        return 0
    triple=data[:3]
    Sub_strings.append(data)
    if len(triple)==3:
        Triples_Array.append(triple)



    return hello(data[1:])
print(hello('abcddd'))

print(Sub_strings)
print(Triples_Array)

output:

['abcddd', 'bcddd', 'cddd', 'ddd', 'dd', 'd']
['abc', 'bcd', 'cdd', 'ddd']

answered Jan 30, 2018 at 17:37

Aaditya Ura

12.8k7 gold badges60 silver badges96 bronze badges

Collectives™ on Stack Overflow

Faster way to get substring in python?

3 Answers 3

Code:

Test Code:

Results:

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Code:

Test Code:

Results:

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related