Can't delete duplicate strings with shell commands

Question

I have a file called "1.txt" which contains the following:

I'm trying to delete duplicate strings from it. Both sort -u 1.txt and sort 1.txt | uniq return this:

Question:

Why is the string "777" still contained twice? How to remove the duplicate?

I wasn't able to reproduce this issue. I copied and pasted your above numbers and tried the same commands and got the desired output. You might have some invisible characters in your file... — Chris J
– Chris J, Commented Dec 26, 2018 at 22:01
Try viewing the file with LC_ALL=C cat -vet 1.txt -- that will show normally-invisible and non-ASCII characters in visible format, including showing newlines (line endings) as "$". I bet this will show a difference between the two "777" lines. — Gordon Davisson
– Gordon Davisson, Commented Dec 26, 2018 at 22:27

alb3rtobr · Accepted Answer · 2018-12-27 00:29:14Z

2

Probably, one of the "777" has a hidden character at the end. Try checking the length of each line of your file with:

$ awk '{ print length($0); }' 1.txt

Compare the length of both "777" lines, they should be different in your file.

answered Dec 27, 2018 at 0:29

alb3rtobr

3563 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lester_wu · Accepted Answer · 2018-12-27 01:54:59Z

0

Try to use sed to delete non-digit char at the end of line, then use sort and uniq to delete duplicate string.

sed  's/[^0-9]\{0,\}$//' 1.txt | sort | uniq

where s : to replace matched string 
      [^0-9] : to match non-digit char
      \{0,\} : zero or more pattern match
      $ : matches the end of lines

answered Dec 27, 2018 at 1:54

Lester_wu

1715 bronze badges

Collectives™ on Stack Overflow

Can't delete duplicate strings with shell commands

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related