0

I have a file called "1.txt" which contains the following:

111
111
222
777
1111
777

I'm trying to delete duplicate strings from it. Both sort -u 1.txt and sort 1.txt | uniq return this:

111
1111
222
777
777

Question:

Why is the string "777" still contained twice? How to remove the duplicate?

3
  • 8
    Check 1.txt for trailing spaces/tabs. Commented Dec 26, 2018 at 20:50
  • I wasn't able to reproduce this issue. I copied and pasted your above numbers and tried the same commands and got the desired output. You might have some invisible characters in your file... Commented Dec 26, 2018 at 22:01
  • 1
    Try viewing the file with LC_ALL=C cat -vet 1.txt -- that will show normally-invisible and non-ASCII characters in visible format, including showing newlines (line endings) as "$". I bet this will show a difference between the two "777" lines. Commented Dec 26, 2018 at 22:27

2 Answers 2

2

Probably, one of the "777" has a hidden character at the end. Try checking the length of each line of your file with:

$ awk '{ print length($0); }' 1.txt

Compare the length of both "777" lines, they should be different in your file.

Sign up to request clarification or add additional context in comments.

Comments

0

Try to use sed to delete non-digit char at the end of line, then use sort and uniq to delete duplicate string.

sed  's/[^0-9]\{0,\}$//' 1.txt | sort | uniq

where s : to replace matched string 
      [^0-9] : to match non-digit char
      \{0,\} : zero or more pattern match
      $ : matches the end of lines

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.