1

If we arrange all the distinct sub-strings of a string lexicographically and we need the i'th substring

1.) Is it possible to find it using it's suffix array and LCP array?

2.) If Yes, how do we do it ? can it be done in O(Nlog^N) while creating the suffix array using Manber & Myers which has a time complexity of O(Nlog^2N) or while creating it's LCP array using kasai's algorithm which has a time complexity of O(N)?

0

1 Answer 1

2

Yes it can be done using Suffix array and LCP array.

Assuming you know how to calculate Suffix array and LCP array.

Let p[] denote suffix array lcp[] denote LCP array.

create a array which store the number of distinct sub string till i'th rank suffix. This can be calculated using this formula. For more details see Here

Let cum[] denote cumulative array which can be calculated as follow:

cum[0] = n - p[0];
for i = 1 to n do:
    cum[i] = cum[i-1] + (n - p[i] - lcp[i])

Now for finding i'th sub string just find the lower bound of i in cumulative array cum[] that will give you rank of suffix from where your sub string should start and print all character till length of

i - cum[pos-1] + lcp[pos] // i lies between cum[pos-1] and cum[pos] so for finding 
                          // length of sub string starting from cum[pos-1] we should 
                          // subtract cum[pos-1] from i and add lcp[pos] as it is 
                          // common string between current rank suffix and 
                          // previous rank suffix. 

where pos is value return by lower bound.

Whole above process can be summarized as follow:

string ithSubstring(int i){
    pos = lower_bound(cum , cum + n , i);
    return S.substr(arr[pos] , i - cum[pos-1] + lcp[pos]);// considering S as original character string 
}

For full implementation of Suffix array , LCP and above logic you can see Here

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for such a quick response, I have been figuring this out for days. I'll understand and implement this asap and accept this as an answer. :)
I have added link to full implementation of above logic you can check that if you get any problem understanding that.
Not yet , I have just recently learned the O(N) and O(log^N) algorithms to create the suffix arrays, it takes time to grasp all the implementations since I am new to this topic
Thank you! I understand it now
@sudoer How would I print all substrings from lcp?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.