0

Apologies for a seemingly inane question. But I have spent the whole day trying to figure it out and it drives me up the walls. I'm trying to write a seemingly simple bash script that would take a list of files in the directory from ls, replace part of the file names using sed, get unique names from the list and pass them onto some command. Like so:

inputs=`ls *.ext`
echo $inputs
test1_R1.ext  test1_R2.ext  test2_R1.ext  test2_R2.ext

Now I would like to put it through sed to replace 1.ext and 2.ext with * to get test1_R* etc. Then I'd like to remove resulting duplicates by running sort -u to arrive to the following $outputs variable:

echo $outputs
test1_R* test2_R*

And pass this onto a command, like so

cat $outputs

I can do something like this in a command line:

ls *.ext | sed s/..ext/\*/g | sort -u

But if I try to assign the above to a variable in the script it just returns the output from the ls. I have tried several ways to do it: including the whole pipe in the script. Running each command separately and assigning it to a variable, then passing that variable to the next command and writing the outputs to files then passing the file to the next command. But so far none of this managed to achieve what I aimed to. I think my problem lies in (except general cluelessness aroung bash scripting) inability to run seq on a variable within script. There seems to be a lot of advice around in how to pass variables to pattern or replacement string in sed, but they all seem to take files as input. But I understand that it might not be the proper way of doing it anyway. Therefore I would really appreciate if someone could suggest an elegant way to achieve, what I'm trying to.

Many thanks!

Update 2/06/2014

Hi Barmar, thanks for your answer. Can't say it solved the problem, but it helped pin-pointing it. Seems like the problem is in me using the asterisk. I have to say, I'm very puzzled. The actual file names I've got are:

test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz

If I'm using the code you suggested, which seems to me the right way do to it:

ins=$(ls *.fastq.gz | sed 's/..fastq.gz/\*/g' | sort -u)

Sed doesn't seem to do anything and I'm getting the output of ls:

test1_R1.fastq.gz test1_R2.fastq.gz test2_R1.fastq.gz test2_R2.fastq.gz

Now if I replace that backslash with anything else, the sed works, but it also returns whatever character I'm putting in front (or after) the asteriks:

ins=$(ls *.fastq.gz | sed 's/..fastq.gz/"*/g' | sort -u)
test1_R"* test2_R"*

That's odd enough, but surely I can just put an "R" in front of the asteriks and then replace R in the search pattern string, right? Wrong! If I do that whichever way: 's/R..fastq.gz/R*/g' 's/...fastq.gz/R*/g' 's/[A-Z]..fastq.gz/R*/g' I'm back to the original names! And even if I end up with something like test1_RR* test2_RR* and try to run it through sed again and replace "_R" for "_" or "RR" for "R", I'm having no luck and I'm back to the original names. And yet I can replace the rest of the file name no problem, just not to get me test1_R* I need.

I have a feeling I should be escaping that * in some very clever way, but nothing I've tried seems to work. Thanks again for your help!

1 Answer 1

1

This is how you capture the result of the whole pipeline in a variable:

var=$(ls *.ext | sed s/..ext/\*/g | sort -u)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.