Sum duplicates in array problem

Question

I get $n$ numbers, where every number is an integer between 1 and 1000. If within given numbers are duplicates, I sum all of them to one number. I repeat it until I have only distinct elements. As a result I return number of distinct elements at the end.

How do I do it in additional space complexity $O(1)$ and time complexity $O(n)$.

I know how to do it in space complexity $O(k)$ and time complexity $O(n+k\log k)$, where $k$ is number of distinct numbers.

Could you expand your answer with one example? E.g. can you describe one step starting with the array [1, 2, 2, 2, 3, 4, 4]? — orlp
– orlp, Commented Feb 2, 2021 at 12:45
And in what order do things get added as well? E.g. if you have [1, 1, 2, 2] you could end up with [6] or [2, 4] depending on the order... — orlp
– orlp, Commented Feb 2, 2021 at 13:09
@MichałPiotrStankiewicz. Following your comments, I have updated my answer (before I was processing all group of elements having the same value simultaneously). The idea I essentially the same. — Steven
– Steven, Commented Feb 2, 2021 at 13:50

Steven · Accepted Answer · 2021-02-02 14:34:30Z

Create an array $A$ of 1000 elements, indexed from $1$. Store in $A[i]$ the number of input elements equal to $i$. This can be done in linear time by scanning the input elements once.

Create a list $L$ which will contain at most $1000$ elements (and can hence be implemented using a fixed-size array). Each element of $L$ is a pair $(m_x, x)$ where $x$ is a number and $m_x > 0$ is the number of its occurrences. Initialize $L$ from $A[i]$ as follows: for each $i$ with $A[i]>0$ append to $L$ the pair $(A[i], i)$ (this can be done in constant time since $A$ contains only $1000$ elements).

From now on the algorithm is trivial, as it is just a simulation of the rules in your question.

Repat:

Search $L$ for the element $(m_x, x)$ with $m_x > 1$ that minimizes $x$.
If such an element does not exist, return the length of $L$.
Otherwise:
- Delete $(m_x, x)$ from $L$.
- If $L$ contains an element $(m_y, y)$ where $ y = x m_x$ increment $m_y$ by $1$; Otherwise add $(1, x m_x)$ to $L$.

An iteration can be carried out in constant time since all list operations take constant time as well ($L$ always has at most $1000$ elements).

Consider now the quantity $\eta$ defined as the number of elements in $L$ plus the number of elements in $L$ with multiplicity greater than $1$. Initially $\eta$ is at most $2000$. In all but the last iteration one of the following two things happens:

An element with $(m_x, x)$ with $m_x > 1$ is deleted from $L$ (causing $\eta$ to decrease by $2$) and some $m_y$ is incremented (causing $\eta$ to increase by at most $1$).
An element with $(m_x, x)$ with $m_x > 1$ is deleted from $L$ (causing $\eta$ to decrease by $2$) and a new element $(1, x m_x)$ is added to $L$ (causing $\eta$ to increase by $1$).

In any case, each but the last iteration decreases the value of $\eta$ by at least $1$. Since the final value of $\eta$ must be positive, it follows that the total number of iterations is at most $2000 - 1 + 1 = 2000 = O(1)$.

Finally, observe that above procedure only uses constant amount of memory and that any $O(n)$ time and $O(1)$ space algorithm is also a $O(n + k \log k)$ time and $O(k)$ space algorithm, thus also answering your second question.

orlp · Accepted Answer · 2021-02-02 14:43:04Z

1

Assuming that you only add together identical numbers to themselves (e.g. not 2 + 2 + 4 + 4), and all in one go (e.g. 2, 2, 2 becomes 6, not 4, 2), there is a fairly simple algorithm. Let array $A$ be your input.

Initialize an empty priority queue $Q$ (that pops in ascending order) and $C$ as a hash map with $O(1)$ access where missing values are reported to have value $0$ instead.
For $1 \leq i \leq n$: increment $C[A[i]]$, then if $C[A[i]] = 1$ add $i$ to $Q$.
While $Q$ is not empty, pop from $Q$, giving $i$. If $C[i] > 1$, increment $C[i \cdot C[i]]$, add $i \cdot C[i]$ to $Q$ and delete $C[i]$. Otherwise if $C[i] = 1$ this is a final distinct element, and you can save it (or increment a counter).

Step 1 takes constant time, step 2 takes $O(n)$ time (at most $1000$ appends to $Q$ are made so that's constant time).

For step 3, if we let potential $p = |Q| + |\{i : C[i] > 0\}| + |\{i : C[i] > 1\}|$, at each iteration we always reduce $p$ by $1$ because we pop from $Q$. Then, if $C[i] > 1$ we stay at worst neutral because we delete $C[i]$ (reducing $p$ by $2$), increment $C[i\cdot C[i]]$ (increasing $p$ by $1$ only if it was initially $0$) and append to $Q$ (increasing $p$ by $1$). Thus in each iteration $p$ decreases by at least $1$, and trivially initially $p \leq 3000$, meaning step $3$ finishes in a constant number of iterations at worst. Each iteration of 3 is also constant time, with the only question mark being put at appending to $Q$ - however $Q$ has at most $1000$ elements at any time, thus that is constant as well.

The total amount of space used is also constant, both $C$ and $Q$ have a $1000$ elements at worst.

edited Feb 2, 2021 at 14:43

answered Feb 2, 2021 at 13:57

orlp

14k1 gold badge27 silver badges41 bronze badges

$\begingroup$ The number of iterations is at most $2000$ since the sum of (i) the number of elements in the hash-map and (ii) the number of elements $x$ in the hash-map with $C[x]>1$ is potential. At each iteration the potential decreases by at least $1$ and it must stay positive by definition. Initially it is at most $2000$. $\endgroup$

Steven
– Steven

2021-02-02 14:15:56 +00:00
Commented Feb 2, 2021 at 14:15
$\begingroup$ Ugh, my definition is slightly imprecise since elements are not always deleted from the hashmap. Consider the slight variant where when $i$ is extracted, it is always deleted from the hashmap (even if $C[i]=1$). This does not make any different as far as the algorithm is concerned. Then if $C[i]=1$ the potential decreases by $1$ because (i) decreases. If $C[i]>1$ then either a new element is added in which case (i) stays the same and (ii) decreases by $1$, or some already existing value $C[j]$ is incremented, in which case (i) decreases by 1 and (ii) does not increase. $\endgroup$

Steven
– Steven

2021-02-02 14:32:11 +00:00
Commented Feb 2, 2021 at 14:32
1

$\begingroup$ @Steven I found a more precise potential without changing the algorithm and included it in my answer. $\endgroup$

orlp
– orlp

2021-02-02 14:38:01 +00:00
Commented Feb 2, 2021 at 14:38
$\begingroup$ @Steven I disgree with your proposed analysis in a subtle way. You might be overcounting deletes. Note that any particular $i$ might be contained multiple times in $Q$, and you would count that as two (or more) deletes, when in reality only one happened. You could account for that by adding another step to the algorithm skipping the extra copies in $Q$ though. $\endgroup$

orlp
– orlp

2021-02-02 14:45:42 +00:00
Commented Feb 2, 2021 at 14:45
$\begingroup$ I see. Is there a particular reason that prevents elements to be added to $Q$ only when $C[i C[i]] = 1$, similarly to what it is done in step 2? In this way, at any generic step of the algorithm you processed elements up to some value $x$ and $Q$ contains (one copy of) the elements $y>x$ with $C[y]>0$. Intuitively $Q$ is the set of values that we still need to process. $\endgroup$

Steven
– Steven

2021-02-02 14:58:18 +00:00
Commented Feb 2, 2021 at 14:58

| Show 1 more comment

Stack Exchange Network

Sum duplicates in array problem

2 Answers 2

Your Answer

Hot Network Questions

Sum duplicates in array problem

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions