0

I found an implementation of suffix array in Ruby and changed it a bit. Here is what I have:

class SuffixArray
    def initialize(str)
        @string = str
        @suffix_array = []
        (0...str.length).each do |i|
            substring = @string[i...str.length]
            @suffix_array << {:suffix=>substring, :index => i}
        end

        @sorted_suffix_array = @suffix_array.sort {|x,y| x[:suffix] <=> y[:suffix]}
    end

    def print_sorted
      @sorted_suffix_array.each {|item| puts "#{item[:index]}=>#{item[:suffix]}"}
      puts "total=>#{@sorted_suffix_array.size()}"
    end

    def print_unsorted
      @suffix_array.each {|item| puts "#{item[:index]}=>#{item[:suffix]}"}
      puts "total=>#{@suffix_array.size()}"
    end

    def find_substring(substring)
        low = 0
        high = @sorted_suffix_array.length
        while(low <= high) do
            mid = (low + high) / 2
            comparison = @sorted_suffix_array[mid][:suffix]#[0..substring.length]
      if comparison > substring
        high = mid - 1
      elsif comparison < substring
        low = mid + 1
      else 
        return @sorted_suffix_array[mid][:index]
      end
        end
    end

end

It works good but it doesn't find all substrings I want. For example

a = SuffixArray.new("there is a man who likes dogs")
puts a.find_substring("man") #won't be found
puts a.find_substring("who likes dogs") #will be found
puts a.find_substring("o likes dogs") #will be found

How do I change the algorithm to make it find all the substrings I want?

3
  • I know. That's why I asked the question. Commented Nov 20, 2012 at 4:45
  • You could maintain the LCP of the suffix array . (Longest common Prefix) - If you search for prefixes of suffixes of the strings in the suffix array- you should find the substrings! - Commented Nov 20, 2012 at 4:52
  • You shouldn't call it "suffix" anymore if you're looking for arbitrary substrings instead of just suffixes.. Commented Nov 20, 2012 at 13:02

2 Answers 2

1

Your code was almost correct. I made some small modifications and it works.

def find_substring(substring)
  low = 0
  high = @sorted_suffix_array.length-1
  while(low <= high) do
    mid = (low + high) / 2
    comparison = @sorted_suffix_array[mid][:suffix][0...substring.length]
    if comparison > substring
      high = mid - 1
    elsif comparison < substring
      low = mid + 1
    else 
      return @sorted_suffix_array[mid][:index]
    end
  end
end
Sign up to request clarification or add additional context in comments.

2 Comments

I actually already had that. Why do you use @sorted_suffix_array.length-1 instead of @sorted_suffix_array.length?
When doing a binary search low and high must be valid indices. Check the code on the wikipedia page on the binary search algorithm.
1

For others; reference, Here's one without holding the sub-string in a hash

Gist: https://gist.github.com/bluetwin/5268722

class SuffixArray

  attr_reader :suf, :string

  def initialize(string)
    @string = string
    @suf = (0..string.size-1).sort_by{|i|@string[i..-1]}
  end

  def substring(idx)
    @string[@suf[idx][email protected]]
  end

  def bsearch(str)
    low = 0
    high = @suf.length-1
    found = nil
    while(low <= high) do
      mid = (low + high) / 2
      comp = substring(mid)
      if comp > str
        high = mid - 1
      elsif comp < str
        low = mid + 1
      else
       found = comp
       low = high + 1
      end
    end
    found
  end

end

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.