Edit: added block quote text
I have a tab delimited text file (acc.paired.txt) of illumina sample names (head):
SRR10598163_R1.fastq.gz SRR8916417_R2.fastq.gz
SRR10598049_R1.fastq.gz SRR10598163_R2.fastq.gz SRR8916418_R1.fastq.gz
SRR10598049_R2.fastq.gz SRR10598164_R1.fastq.gz SRR8916418_R2.fastq.gz
SRR10598050_R1.fastq.gz SRR10598164_R2.fastq.gz SRR8916419_R1.fastq.gz
SRR10598050_R2.fastq.gz SRR10598165_R1.fastq.gz SRR8916419_R2.fastq.gz
SRR10598051_R1.fastq.gz SRR10598165_R2.fastq.gz SRR8916420_R1.fastq.gz
SRR10598051_R2.fastq.gz SRR10598166_R1.fastq.gz SRR8916420_R2.fastq.gz
SRR10598052_R1.fastq.gz SRR10598166_R2.fastq.gz SRR8916421_R1.fastq.gz
SRR10598052_R2.fastq.gz SRR10598167_R1.fastq.gz SRR8916421_R2.fastq.gz
SRR10598053_R1.fastq.gz SRR10598167_R2.fastq.gz SRR8916422_R1.fastq.gz
SRR10598053_R2.fastq.gz SRR10598168_R1.fastq.gz SRR8916422_R2.fastq.gz
SRR10598054_R1.fastq.gz SRR10598168_R2.fastq.gz SRR8916423_R1.fastq.gz
and I'd like to make two changes, 1) remove duplicate sample names and 2) remove all characters after the specific sample name. My goal output is a tab delimited text file which contains just the SRR### numbers (no _R#.fastq.qz) with no duplicates. Example goal output:
SRR10598163
SRR8916417
SRR10598049
SRR8916418
SRR10598164
SRR10598050
SRR8916419
SRR10598165
SRR10598051
SRR8916420
SRR10598166
SRR10598052
SRR8916421
SRR10598167
SRR10598053
SRR8916422
SRR10598054
SRR10598168
SRR8916423
I turned to sed to remove character patterns:
`sed 's| _R1.fastq.gz||g' acc.paired.txt > out.txt`
But out.txt had no changes.
TIA.
dir > acc.paired.txtsedcommand failed: You seem to have a whitespace before the_R, so it will not match.