1

I am coding Skew Algorithm to construct the Suffix Array and I have a long string (length >= 4000). In Skew Algorithm, I have to construct the Triples Array and Sub-strings Array.

For example : I have a string s = 'abcddd'.

  • Triples Array is : 'abc', 'bcd', 'cdd', 'ddd'
  • Sub-strings Array is : 'abcddd', 'bcddd', 'cddd', 'ddd', 'dd', 'd'

This is my solution :

import numpy as np

# example
string = 'abdcb.....' (length >= 4000)
temp = 'abdcb......###' (length >= 4000)

triples_arr = np.array([])
sub_strings = np.array([])

for i in range (0, len(temp) - 3):
    triples_arr = np.append(triples_arr, temp[i:i + 3])
    sub_strings = np.append(sub_strings, string[i:string_len])

With a long string (length >= 4000), it take a minute to complete.

Is there any solution that I can decrease the processing time of that task?

3
  • I think when constructing a suffix array, you don't copy all the suffix, instead you use something more lightweight like starting position to represent a suffix Commented Jan 30, 2018 at 7:02
  • In Skew Algorithm, I need the triple suffix to sort and label with a character in the alphabet. The solution of @StephenRauch was solved my problem. Commented Jan 30, 2018 at 7:20
  • I think building the Triples Array is fine. But building the sub_strings array need n^2 space and time which defeats the purpose of building suffix array in linear time. Commented Jan 30, 2018 at 9:54

3 Answers 3

2

Using comprehensions, you can construct these strings faster than using a for loop:

Code:

triples_arr = [a_string[i:i+3] for i in range(0, len(a_string)-1)]
sub_strings = [a_string[i:] for i in range(len(a_string))]

Test Code:

a_string = 'abcdefghijklmnopqrstuvwxyz'

triples_arr = [a_string[i:i+3] for i in range(0, len(a_string)-2)]
print(triples_arr)

sub_strings = [a_string[i:] for i in range(len(a_string))]
print(sub_strings)

Results:

['abc', 'bcd', 'cde', 'def', 'efg', 'fgh', 'ghi', 'hij', 'ijk', 'jkl',
 'klm', 'lmn', 'mno', 'nop', 'opq', 'pqr', 'qrs', 'rst', 'stu', 'tuv',
 'uvw', 'vwx', 'wxy', 'xyz']
['abcdefghijklmnopqrstuvwxyz', 'bcdefghijklmnopqrstuvwxyz',
 'cdefghijklmnopqrstuvwxyz', 'defghijklmnopqrstuvwxyz',
 'efghijklmnopqrstuvwxyz', 'fghijklmnopqrstuvwxyz',
 'ghijklmnopqrstuvwxyz', 'hijklmnopqrstuvwxyz', 'ijklmnopqrstuvwxyz',
 'jklmnopqrstuvwxyz', 'klmnopqrstuvwxyz', 'lmnopqrstuvwxyz',
 'mnopqrstuvwxyz', 'nopqrstuvwxyz', 'opqrstuvwxyz', 'pqrstuvwxyz',
 'qrstuvwxyz', 'rstuvwxyz', 'stuvwxyz', 'tuvwxyz', 'uvwxyz',
 'vwxyz', 'wxyz', 'xyz', 'yz', 'z']
Sign up to request clarification or add additional context in comments.

Comments

0

This may or may not work for you, but if you operate on bytes and memoryview objects instead of string objects, many optimizations become available. For example, it is very cheap to slice memoryviews.

Comments

0

What about without any external lib and without any loop ?

Triples_Array=[]
Sub_strings=[]

def hello(data):
    if not data:
        return 0
    triple=data[:3]
    Sub_strings.append(data)
    if len(triple)==3:
        Triples_Array.append(triple)



    return hello(data[1:])
print(hello('abcddd'))

print(Sub_strings)
print(Triples_Array)

output:

['abcddd', 'bcddd', 'cddd', 'ddd', 'dd', 'd']
['abc', 'bcd', 'cdd', 'ddd']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.