-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Description
Steps to reproduce
# Contrived example; imagine a pipelined collection of a lot of large objects.
get-childitem c:\ -recurse -ea silentlycontinue | select-object Name -uniqueExpected behavior
Items with a unique name are output immediately to the host.
Actual behavior
Items are buffered and require a large amount of memory and a ridiculous amount of time.
Reason
Passing -Unique creates a List<T> to store all items, and performance is O(n^2) based on this source. Instead, you could instead create a key based on properties selected or some other heuristic. In fact, using that same ObjectCommandComparer, you could probably even use its GetHashCode (if implemented properly). I have been using my own Select-Unique for years quite successfully. It has O(n) performance and uses very little memory. Its key algorithm is roughly copied from what Group-Object does.
I had to write this when I was trying to filter unique items in a huge graph of objects. In that particular case, Select-Object -Unique ended up throwing an OutOfMemoryException. My version didn't and was much faster to use even on smaller data sets.
Environment data
Name Value
---- -----
PSVersion 6.2.3
PSEdition Core
GitCommitId 6.2.3
OS Microsoft Windows 10.0.18362
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0