4

I have a file and its name looks like:

12U12345._L001_R1_001.fastq.gz

I want to assign to a variable just the 12U12345 part.

So far I have:

variable=`basename $fastq | sed {s'/_S[0-9]*_L001_R1_001.fastq.gz//'}`

Note: $fastq is a variable with the full path to the file in it.

This solution currently returns the full file name, any ideas how to get this right?

3 Answers 3

5

Just use the built-in parameter expansion provided by the shell, instead of spawning a separate process

fastq="12U12345._L001_R1_001.fastq.gz"
printf '%s\n' "${fastq%%.*}"
12U12345

or use printf() itself to store to a new variable in one-shot

printf -v numericPart '%s' "${fastq%%.*}"
printf '%s\n' "${numericPart}"

Also bash has a built-in regular expression comparison operator, represented by =~ using which you could do

fastq="12U12345._L001_R1_001.fastq.gz"
regex='^([[:alnum:]]+)\.(.*)'

if [[ $fastq =~ $regex ]]; then
    numericPart="${BASH_REMATCH[1]}"
    printf '%s\n' "${numericPart}"
fi
Sign up to request clarification or add additional context in comments.

Comments

2

You could use cut:

$> fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
$> variable=$(basename "$fastq" | cut -d '.' -f 1)
$> echo "$variable"
12U12345

Also, please note that:

  • It's better to wrap your variable inside quotes. Otherwise you command won't work with filenames that contain space(s).

  • You should use $() instead of the backticks.

1 Comment

One could use cut, but performing a command substitution or using any external tools -- basename also included -- creates a huge (multiple-orders-of-magnitude) performance penalty to doing so in preference to bash's built-in string manipulation support. Bash-the-interpreter is slow, to be sure, but its reputation as the killer of boot times has much more to do with folks writing code that involves unnecessary new processes than with innate performance.
2

Using Bash Parameter Expansion to extract the basename and then extract the portion of the filename you want:

fastq="/path/to/12U12345._L001_R1_001.fastq.gz"
file="${fastq##*/}"  # gives 12U12345._L001_R1_001.fastq.gz
string="${file%%.*}" # gives 12U12345

Note that Bash doesn't allow us to nest the parameter expansion. Otherwise, we could have combined statements 2 and 3 above.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.