0

I have many strings like this

i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt

I need to extract the M1 code and the FCT but I am unable to do so, likely due to the regular expressions. FCT I can do with echo ${i:30:3}, but for M1 nothing seems to work - my last try was grep -oP '.*\K(?<=.\/)\w+(?=\/Cus)' $i ;

The length of the string can vary (but it always starts with /F) and /M1/ is always in the same position

Hope somebody can help. Thanks!

1
  • You say the section you want to find always starts with /F but you don't say that no other section can also start with /F, e.g. if your input was about a customer in Flagstaff instead of Boston. You imply that FCT can appear in different positions but maybe not. 1 sample input line isn't enough for us to guess the general form of your input and test a potential solution. Please provide about 4 or 5 diverse input lines and the expected output given that input so we can help you solve your real problem as opposed to just producing the expected output for that 1 line of sample input Commented Feb 10, 2022 at 13:27

5 Answers 5

3

You could try following awk programs.

To get FCT likewise strings try: Since position of string is NOT fixed as well as only /F is fixed, so I am trying to match /F till next occurrence of / so it will catch any value after /F but before next occurrence of / here.

echo "$i" | awk 'match($0,/\/F[^/]*/){print substr($0,RSTART+1,RLENGTH-1)}'

To get M1 try following awk program, since position of M1 is always fixed(as per OP in question), so I am using 2 substitute calls here, where first one is removing starting ./ with NULL and 2nd substitute call is removing everything from / to till last of line with NULL and then printing the line which will give M1 part.

echo "$i" | awk '{sub(/^\.\//,"");sub(/\/.*/,"")} 1'
Sign up to request clarification or add additional context in comments.

4 Comments

Like the use of match() and sub().
The one to get M1 prompts CustomersList instead (don't know why). The one to get FCT works as long as it is Boston and not Falls - how could this be fixed?
@Miguel, when I tested my code with i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt variable posted in question it worked fine for me, is your variable exactly same as shown? For FCT you mentioned /F will be constant so I have written regex as per that only.
there was an error in the variable due to me being in a different folder. Thanks! It works now!
2

Bash allows you to split a string into an array.

# starting value
str=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt

# split string on / delimiter into the split array
IFS=/ read -ra split <<<"$str"

# get M1 and FCT elements at their respective indexes
M1=${split[1]}
FCT=${split[5]}

# dump M1 and FCT variables for demo purpose
declare -p M1 FCT

1 Comment

Nicely done. Herestring and all.
2

Another option with awk is split() to split the path components into an array. The array a[] is filled by the command below and the 2nd and 6th elements ("M1", and "FCT")

awk '{split($1,a,"/"); print a[2]", "a[6]}'

Example Use/Output

$ i=./M1/CustomersList/HTP/Boston/FCT/output_GetCaseList_abs.txt; echo "$i" | 
awk '{split($1,a,"/"); print a[2]", "a[6]}'
M1, FCT

Comments

2

If the positions of the strings are always after the same number of forward slashes, you can print the 2nd and the 6th field, setting the field separator to /

echo "$i" | awk -F"/" '{print $2, $6}'

Output

M1 FCT

You might also use gnu awk and a pattern with 2 capture groups matching the following Cus for the first match, and starting with F for the second match.

The negated character class [^\/]* matches 0 or more characters except a /

echo "$i" | awk 'match($0, /[^\/]*\/([^\/]*)\/Cus.*\/(F[^\/]*)/, a) {print a[1], a[2]}'

Comments

2

You have your awk answers, but I felt like contributing a bash idea just for fun.

[[ "$i" =~ ^\./([[:alnum:]]+)(/[[:alnum:]]+){3}/([[:alnum:]]+)/.* ]] \
    && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[3]}"

BASH_REMATCH array matches the capture groups in the test case. Index 0 is the complete string.


A slightly shorter version yielding the same output:

[[ "$i" =~ ^\./(.+)(/.+){4}/.* ]] \
    && echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]:1}"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.