I've got a knotty problem that I can't figure out how to solve.
I have a text file containing a few million lines of text. Basically I want to run uniq, but with a twist: If two lines are identical but for a :FOO suffix, drop the line that lacks the suffix. But only if the lines are otherwise identical. And only for :FOO, not any other possible suffix.
do not want to drop /usr/bin/delta:FOO, because the line above isn't identical.
red.7
green.2
green.2:FOO
blue.6
yellow.9:FOO
I want to delete green.2, because the line below is identical but with a suffix. All other lines should be retained unchanged.
[Edit: I forgot to mention, the file is already in sorted order.]
My thoughts so far:
- Obviously
uniqis the tool to do this. - You can make
uniqignore a prefix, but never a suffix. (This is extremely annoying!) - I thought perhaps you could pretend that
:is a field separator, and getcut(together withpaste) to flip the field order. But no, it is apparently impossible to forcecutto output a blank line if no separator is present. - My next thought is to go through line by line and output a 1-character prefix depending on the presence or absence of the suffix... but I can't imagine scripting that as a Bash loop being performant.
Any hints?
I may end up just using a real programming language to fix this. It seems simple enough to fix in Bash, but I've already wasted quite a lot of time failing to get it to work...
:FOO, the one without it or either? Can you have identical lines that don't have:FOOand, if so, what should be done with those?:FOOsuffix.uniqhas--check-chars, which enables you to ignore suffixes, too.--skip-charsenables you to ignore the first N characters.