0

So I have array of String, and I'd like to see if one has (contains) others as part of String.

For example, consider following simple array.

s[0]="Java"
s[1]="Java Programming"
s[2]="C Programming"
s[3]="C Programming is Cool"

In the end, I only want to keep

s[1]="Java Programming"
s[3]="C Programming is Cool"

because s[1] contains s[0] and s[3] contains s[2].

This is my code to detect if array element contains array element using String.Contains() method, which seems really basic and inefficient..

int startPtr = 0;
while (startPtr < s.length-1) {
    int tempPtr = startPtr+1;
    while (tempPtr <= s.length-1) {
        if (s[tempPtr].contains(s[startPtr])) { 
            //At this point, I know that I don't need s[startPtr] in result.
            //Remove item at startPtr, if this were ArrayList or something.
            startPtr++;
            break; 
    } else { indexPtr++; }
}

And after startPtr reaches end, I think I have to do the same thing in reverse order (start from the end and check towards beginning of the array) to ensure no string is part of other string element.

Can someone help me with better algorithm? Also, I believe this alogirthm will have O(N^2), am I correct?

6
  • Is it correct? its O(N^2)*O(time for string comparison). Commented Oct 25, 2016 at 19:47
  • You will have to think of something very clever to get a better big-O performance. Basically you have to compare every string to every other string, that inherently takes a quadratic number of calls to contains(). Commented Oct 25, 2016 at 20:08
  • @Jay is it important to keep the result in the same array, and in the same positions/order? Commented Oct 25, 2016 at 22:45
  • @mapeters Not necessarily Commented Oct 25, 2016 at 23:28
  • btw I foudnd that String.contains method will not work as intended. For example, "Apple" and "Applepie" are two distinct string but contains method won't differentiate so I will have to use regex pattern and matcher Commented Oct 25, 2016 at 23:30

3 Answers 3

1

I would recommend sorting the strings in s in order of decreasing length first. After doing so, when iterating through s, each string cannot be contained within a later string in s, since later strings are shorter in length. As a result, you will only have to iterate through s once, and won't need to perform any backtracking.

List<String> finalStrs = new ArrayList<>();
// You will have to create decreasingLengthComparator
Arrays.sort(s, decreasingLengthComparator);
for (String str : s) {
    boolean addToFinal = true;
    for (String finalStr : finalStrs) {
        if (finalStr.contains(str)) {
            addToFinal = false;
            break;
        }
    }
    if (addToFinal) {
        finalStrs.add(str);
    }
}

The efficiency of the sorting is O(nlog(n)). The efficiency of iterating through s and checking if the strings are in finalStrs is O(n^2 / 2)*O(time for string comparison).

As a result, the overall complexity is O(nlog(n) + n^2 / 2 * time for string comparison) = O(n^2 / 2 * time for string comparison), which is an improvement over your algorithm (albeit a very slight improvement, but the algorithm is also easier to implement and follow in my opinion).

Sign up to request clarification or add additional context in comments.

4 Comments

They key here is that you are adding the items to a new list as opposed to removing them from the list, ensuring that a deleted item will not mess up pointer arithmetic and cause an out of bounds error. However, this can also be accomplished in place, by sorting them in ascending order and iterating through the array in reverse.
@mapeters Thanks for your idea.
@dberm22 You mentioned it can be done in place, when sorted in ascending order and start from the end. Did you mean I wouldn't need to use a new list to add items? If so, how do you achieve that?
@JayKong see my answer for further explanation.
0

I am responding to this as an answer because the OP requested more information on my comment on mapeter's answer. To reiterate, they key to mapeter's solution is that he is adding the items to a new list as opposed to removing them from the list, ensuring that a deleted item will not mess up pointer arithmetic and cause an out of bounds error. However, this can also be accomplished in place, by iterating through the array in reverse:

Collections.sort(s, new LengthCompare());
for (int i = s.size() - 1; i >= 1; i--)
{
    for (int j = i-1; j >= 0; j--)
    {
        if (s[j].contains(s[i]))
        {
            s.remove(i)
            break;
        }
    }
}

private static class LengthCompare implements Comparator<String>
{
    public int compare(String s1, String s2)
    {
        return (s2.length() - s1.length());
    }
}

Of course, since primitive arrays are fixed size, this is only for Lists (which, without seeing the rest of the code this goes into, I can't see why you couldn't use one).

Also, I have not tested to see if this actually compiles. This is merely pseudocode, and I may have mixed array and list types, but the form is still the same.

Comments

0

There is another possibility for big number of strings and relatively short strings. It's computation complexity is O(nlog(n) + nk^2*log(n*k)), where n is number of strings and k is length of longest string.

The idea is to create lookup set of all possible substrings of strings already included into resultset and check for existance in this set.

In worst case you'll have n*k^2/2 different strings in lookup set.

TreeSet<String> containedStrings = new TreeSet<>();
List<String> finalStrs = new ArrayList<>();
// You will have to create decreasingLengthComparator
Arrays.sort(s, decreasingLengthComparator);
for (String str : s) 
    if (!containedStrings.contains(str))
        finalStrs.add(str);
        for (int i = 0; i < s.length(); i++)
            for (int j = i+1; j <= s.length(); j++)
                containedStrings.add(s.substring(i, j));
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.