golang - Kth largest element in an array using quick select - time limit exceeded issue

Question

I am trying to solve the following problem in leetcode.
https://leetcode.com/problems/kth-largest-element-in-an-array/description

Given an integer array \$nums\$ and an integer \$k\$, return the \$k^\textrm{th}\$ largest element in the array. ...

Can you solve it without sorting?

I think my time complexity is \$O(log n)\$. All test cases are passing except one. It would be helpful to know how I can further improve this code?

func findKthLargest(nums []int, k int) int {
    helper(nums, 0, len(nums) - 1, k - 1)
    return nums[k - 1]
}

func helper(nums []int, start, end, k int) {
    if start == end { return }

    pivIdx := rand.Intn(end + 1 - start) + start
    nums[start], nums[pivIdx] = nums[pivIdx], nums[start]
  
    bigger := start
    for smaller := start + 1; smaller <= end; smaller++ {
        if nums[smaller] > nums[start] {
            bigger++
            nums[bigger], nums[smaller] = nums[smaller], nums[bigger]
        }
    }

    nums[start], nums[bigger] = nums[bigger], nums[start]

    if bigger == k {
        return 
    } else if k < bigger {
        helper(nums, start, bigger - 1, k)  
    } else {
        helper(nums, bigger + 1, end, k)
    }
}

\$\begingroup\$ Can you tell which test case fails? \$\endgroup\$

Martin R
– Martin R

2024-10-18 10:37:42 +00:00
Commented Oct 18, 2024 at 10:37 — Martin R
– Martin R, Commented Oct 18, 2024 at 10:37

J_H · Accepted Answer · 2024-10-18 22:11:21Z

how I can further improve this code?

We see a lot of function calls there. And since the input \$nums\$ is not guaranteed to be sorted, you cannot possibly accomplish the task in less than \$O(n)\$ linear time.

Maintain a heap (a priority queue) of size \$k\$. Linearly scan over \$nums\$, inserting into the heap as you go. Once the heap contains \$k\$ elements, each insertion will evict one element (possibly a tiny element we're attempting to insert). The head of the heap will always contain the max value encountered thus far.

Once the scan completes, just read the answer from the end of the heap. It will be the smallest value retained by the heap.

Did this take \$O(n \log n)\$ time overall? No.

Why is that? Because each heap insert cost \$O(\log k)\$. And given that \$k\$ is a constant, that makes the cost \$O(1)\$ constant, as well; it is independent of the size of the input vector.

Doing that \$n\$ times gives \$O(n)\$ linear cost for the task.

(Actually, for a non-worst-case input, such as \$nums\$ in random order, almost all insertion attempts will be satisfied by comparing against the end entry, letting us discard the currently scanned item with \$O(1)\$ cost, just a single comparison. Even something terrible, like Insertion Sort on a bounded-size list, would perform adequately. Worst case can be completely avoided with a \$O(n)\$ linear cost preprocessing step of a Fisher-Yates shuffle.)

PR process

Submissions to Code Review resemble a Pull Review process, where a development team tries to answer two questions about the code before merging a feature branch down to main.

Is it correct?
Is it maintainable?

comments lie!

From the Review Context we have this, which could essentially be a comment in source code:

I think my time complexity is 𝑂(𝑙𝑜𝑔 𝑛).

I am delighted the OP tried to assess the big-O cost. However the problem statement makes it apparent that, absent any specified structure or ordering, a solution will have to inspect \$n\$ elements, giving at least \$O(n)\$ time complexity. It's not like \$nums\$ is a bit vector with at most \$m\$ \$1\$'s, which lets a counting approach exit early, or is an int vector with some ordering. The clear error in the proposed cost makes me skeptical about what other errors may be in the submission, and less interested in working out its details. This is a sad but true aspect of reviews: a reviewer will tend to pick out the top few items before even getting to subtler points. When submitting code we strive to tidy up such items in the hopes the reviewer will then delve deeper to offer better advice.

citation

Martin R comments:

The question asks about finding the k-th largest element with Quickselect, which has an average complexitly of O(n). Can you elaborate why you recommend to use a heap-based method instead?

It is always helpful to cite a reference when implementing an algorithm, as then we will have a Specification to evaluate the code against. The OP declined to do so, but let's imagine that wikipedia article was mentioned in the source code.

This implies a design decision, also not explained in the OP. We could choose either of the {Hoare, Lomuto} partition schemes. The cited source broke this out as a partition() helper, but that is absent in the OP code so it must have been inlined, at the randomized pivIdx assignment. Both of the cited schemes produce a deterministic pivot point. In my naïve reading I fail to find either scheme in the OP, and the author has declined to offer hints in the source text. The code may very well be correct, but I do not find that it is obviously correct. Which makes it harder to assess that it is easily maintainable by dev team members. I suggested a decomposition into subproblems which, when coded up by the OP in the usual way, was likely to give a solution that other team members could easily test and reason about.

helper

Never literally call a helper function helper(). You have an opportunity to explain what it does, both via the name and via Go Doc comments. Don't waste that opportunity.

recursion

We write recursive code in order to break a hard problem into a collection of easier subproblems.

The base case is clearly identified:

    if start == end { return }

Then we have a bunch of code which apparently is trying to preserve some invariant, while also making a loop variant true such as squishing together bigger and one of the end points. But there's no citation, no Go Doc comments, no comments, and I didn't find the {bigger, smaller} names as helpful as they could be. I didn't find it obvious that the code does a certain thing correctly.

And then another(?) base case is dealt with at once, rather than after one more recursion:

    if bigger == k {
        return

identifiers

We have smaller which is literally always at least one bigger than bigger. Hmmm, ok, didn't want to call them i and j? I suppose one must read them as indexes pointing at values which are smaller or bigger than one another. But I didn't find it obvious how that is always enforced.

Where do you see “a lot of function calls”? There is a main function which calls a helper function, which calls itself (tail recursion). Is that a bad thing?

        helper(nums, start, bigger - 1, k)  
    } else {
        helper(nums, bigger + 1, end, k)

Thank you for pointing out the TCO, that's helpful. Given that bigger will conditionally sometimes get bigger, I wasn't quite seeing the loop variant which guarantees we have a smaller problem to solve each time. But yeah, I suppose we could readily code this up as a loop.

Is recursion a bad thing? No, certainly not, it can be a terrific way of communicating a technical idea to another person, of clearly demonstrating that the problem size is continually being whittled down and at some point we will terminate. The trouble with the OP code is I didn't find clear communication in it, so given the author's current skill set I suggested an alternate approach I felt could lead to better communication and more maintainable code.

The review context said at least one set of inputs made the code timeout, and I was concerned we might be making more than the minimum required number of recursive calls.

speed

... average complexity of O(n). Can you elaborate why you recommend to use a heap-based method instead?

I proposed a method with worst case complexity of \$O(n)\$, and with invariants and with loop variants that are easy to reason about.

A contest is an adversarial setting, with the contest site trying to dream up crazy vectors of \$10^5\$ integers which might trigger unfortunate behavior for a proposed algorithm.

Worst case complexity of Quickselect is \$O(n^2)\$ quadratic, though given the random pivot I don't believe the OP code can fall into that trap. I hope the contest organizers wouldn't mess with rand.Intn() output.

radix sort

Here is a fresh perspective on the original problem. The 16-bit input range is rather limited -- much more limited than the vector size.

Allocate \$2 \times 10^4 + 1\$ zeroed counters. (The \$+ 1\$ is to accommodate a \$0\$ value, and the \$2\$ accounts for the sign bit.)
Make a scan of \$n\$ input values, incrementing the corresponding counters, with cost \$O(n)\$.
Scan backward through the counters until you find \$k\$ non-zero counter values.

That last step may seem "wasteful", in that it likely examines a great many boring zero counts. But it examines fewer than \$n\$ counts, so we're still at \$O(n)\$ time complexity overall, at the expense of allocating a modest and strictly bounded amount of memory.

The question asks about finding the k-th largest element with Quickselect, which has an average complexitly of O(n). Can you elaborate why you recommend to use a heap-based method instead? — Martin R
– Martin R, Commented Oct 18, 2024 at 5:11
Where do you see “a lot of function calls”? There is a main function which calls a helper function, which calls itself (tail recursion). Is that a bad thing? — Martin R
– Martin R, Commented Oct 18, 2024 at 5:20
@MartinR thank you, those are good questions. I have discovered a truly marvelous proof of this, which this margin is too narrow to contain. — J_H
– J_H, Commented Oct 18, 2024 at 20:08

Stack Exchange Network

golang - Kth largest element in an array using quick select - time limit exceeded issue

1 Answer 1

PR process

comments lie!

citation

helper

recursion

identifiers

speed

radix sort

You must log in to answer this question.

Hot Network Questions

golang - Kth largest element in an array using quick select - time limit exceeded issue

1 Answer 1

PR process

comments lie!

citation

helper

recursion

identifiers

speed

radix sort

You must log in to answer this question.

Related

Hot Network Questions