Time complexity of this algorithm

Question

I've been doing some reading on time complexity, and I've got the basics covered. To reinforce the concept, I took a look at an answer I recently gave here on SO. The question is now closed, for reason, but that's not the point. I can't figure out what the complexity of my answer is, and the question was closed before I could get any useful feedback.

The task was to find the first unique character in a string. My answer was something simple:

public String firstLonelyChar(String input)
{
    while(input.length() > 0)
    {
        int curLength = input.length();
        String first = String.valueOf(input.charAt(0));
        input = input.replaceAll(first, "");
        if(input.length() == curLength - 1)
            return first;
    }
    return null;
}

Link to an ideone example

My first thought was that since it looks at each character, then looks at each again during replaceAll(), it would be O(n^2).

However, it got me to thinking about the way it actually works. For each character examined, it then removes all instances of that character in the string. So n is constantly shrinking. How does that factor into it? Does that make it O(log n), or is there something I'm not seeing?

What I'm asking:

What is the time complexity of the algorithm as written, and why?

What I'm not asking:

I'm not looking for suggestions to improve it. I get that there are probably better ways to do this. I'm trying to understand the concept of time complexity better, not find the best solution.

I think this depends on a number of things: 1) does length() have to count every character in the string each time, or is the length stored separately so it can be returned without a linear scan? 2) If length() is linear, do we at least remember the result for subsequent calls? 3) What kind of memory management is used by the class - is erasing a character a simple constant-time operation, or does it require a linear re-copy of the remainder of the string? 4) When, if ever, do cumulative erasures cause reallocation and copy? And so on... — twalberg
– twalberg, Commented Feb 20, 2013 at 16:21
@twalberg: I think a safe assumption is that replaceAll must scan the entire string, since that is the "standard" implementation, unless otherwise specified. — recursive
– recursive, Commented Feb 20, 2013 at 16:24
@recursive Right - replaceAll certainly must scan the entire string to look for matches, but the question was more for the actual erasure and how it's handled - I can think of three equally effective methods to write replaceAll off the top of my head, but they're not equivalent in efficiency (copy non-erased characters to temp string and copy back, or shift the string tail left on each erasure, or simply mark an erased character in some way to indicate "no character here")... — twalberg
– twalberg, Commented Feb 20, 2013 at 16:26
@twalberg: Given that it doesn't seem to be doing in-place erasure, (input = /* ... */) do any of your three methods have run-time complexity other linear? — recursive
– recursive, Commented Feb 20, 2013 at 16:28

Jason Aller · Accepted Answer · 2018-05-31 23:31:10Z

4

The worst time complexity you will have is for the string aabb... and so on each character repeated exactly twice. Now this depends on the size of your alphabet let's say that is S. Let's also annotate the length of your initial string with L. So for each letter you will have to iterate over the whole String. However first time you do that the String will be of size L, second time L-2 and so on. Overall you will have to perform in the order of L + (L-2) + ... + (L- S*2) operations and that is L*S - 2*S*(S+1), assuming L is more than 2*S.

By the way if the size of your alphabet is constant, and I suppose it is, the complexity of your code is O(L)(though with a big constant).

edited May 31, 2018 at 23:31

Jason Aller

3,66028 gold badges43 silver badges40 bronze badges

answered Feb 20, 2013 at 16:17

Ivaylo Strandjev

71.4k18 gold badges130 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

recursive · Accepted Answer · 2013-02-20 16:22:25Z

1

The worst case is O(n^2) where n is the length of the input string. Imagine the case where every character is doubled except the last one, like "aabbccddeeffg". Then there are n/2 loop iterations, and each call to replaceAll has to scan the entire remaining string, which is also proportional to n.

Edit: As Ivaylo points out, if the size of your alphabet is constant, it's technically O(n) since you never consider any character more than once.

edited Feb 20, 2013 at 16:22

answered Feb 20, 2013 at 16:16

recursive

86.5k36 gold badges155 silver badges246 bronze badges

2 Comments

Ivaylo Strandjev Over a year ago

I don't think it is N^2 the complexity depends on the size of the alphabet - he will never consider a character more than once so it is more of the order of L * S where L is the length and S is the size of the alphabet. In fact if we assume S is constant the complexity is only O(N).

recursive Over a year ago

@IvayloStrandjev: That's a good point. In the case of the entire unicode character set, N^2 still may be a tighter bound though. I'll make a note in my answer reflecting it.

Wasafa1 · Accepted Answer · 2013-02-20 16:32:15Z

1

Let's mark:

m = number of unique letters in the word
n = input length

This is the complexity calculation:
The main loop goes at most m times, because there are m different letters,
the .Replaceall checks at most O(n) comparisons in each cycle.

the total is: O(m*n)

an example for O(m*n) cycle is: input = aabbccdd,

m=4, n=8

the algorithm stages:

1. input = aabbccdd, complex - 8
2. input = bbccdd, complex = 6
3. input = ccdd, complex = 4
4. input = dd, complex = 2

total = 8+6+4+2 = 20 = O(m*n)

edited Feb 20, 2013 at 16:32

answered Feb 20, 2013 at 16:26

Wasafa1

8053 gold badges11 silver badges21 bronze badges

2 Comments

Geobits Over a year ago

It's looking for the first unique character, not the first non-unique. Your example string would stop at 'a'.

Wasafa1 Over a year ago

Thanks for the comment, I changed the example. The complexity calculation is still valid though.

zenpoy · Accepted Answer · 2013-02-20 17:02:23Z

Let m be the size of your alphabet, and let n be the length of your string. The worse case would be to uniformly distribute your string's characters between the alphabet letters, meaning you'll have n / m characters for each letter in your alphabet, let's mark this quantity with q. For example, the string aabbccddeeffgghh is the uniformly distribution of 16 characters between the letters a-h, so here n=16 and m=8 and you have q=2 characters for each letter.

Now, your algorithm is actually going over the letters of the alphabet (it just uses the order which they appear in the string), and for each iteration it has to go over the length of the string (n) and shrink it by q (n -= q). So over all the number of operation you do in the worst case are:

s = n + n-(1*q) + ... + n-((m-1)*q)

You can see that s is the sum of the first m elements of the arithmetic series:

s = (n + n-((m-1)*q) * m / 2 =  
    (n + n-((m-1)*(n/m)) * m / 2 ~ n * m / 2

Collectives™ on Stack Overflow

Time complexity of this algorithm

4 Answers 4

Comments

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related