1

Input:

<e1 name="file1" id="id1" anotherId="id2">

Desired output:

file1

I can get what I need with this:

echo '<e1 name="file1" id="id1" anotherId="id2">' | sed 's/\(.*name="\)\(.*\)\(".*\)/\2/' | sed 's/".*//'

Output: file1

I would like to improve the set of commands and remove the last pipe to sed if possible. If I remove the last pipe to sed, I cannot get what I want:

echo '<e1 name="file1" id="id1" anotherId="id2">' | sed 's/\(.*name="\)\(.*\)\(".*\)/\2/'

Output:

file1" id="id1" anotherId="id2

As you can see sed is picking up the last quotation mark and not the first after file1.

Can someone help improve this command?

1
  • I should have added that attribute order may vary. Commented Sep 14, 2013 at 11:26

2 Answers 2

2
echo '<e1 name="file1" id="id1" anotherId="id2">' |
  sed -n 's/.*name="\([^"]*\)".*/\1/p'

Or with GNU grep if built with PCRE support:

echo '<e1 name="file1" id="id1" anotherId="id2">' |
  grep -Po 'name="\K[^"]*'
1
  • Thank you; the sed example works perfectly even when the attributes are in a different order. Commented Sep 14, 2013 at 11:25
2

sed

You could simplify it a bit with this version:

$ echo '<e1 name="file1" id="id1" anotherId="id2">' | \
   sed 's/.*name="\(.*\)" id.*/\1/'

You don't need to wrap everything with parens, only the things you're interested in saving for use later on, so you can remove.

grep

You can also use grep's ability to use Perl's regular expression engine (PCRE):

$ echo '<e1 name="file1" id="id1" anotherId="id2">' | \
   grep -Po '(?<=name=")(\w+)(?=")'

This use's PCRE's ability to look ahead and lookbehind. The notation looks for a sequence of characters such as "name=" before what we're looking for. This bit is doing this:

(?<=name=")

It then looks for a series of word characters, this is what we're actually looking for:

(\w+)

The last bit that's doing the lookahead is this:

(?=")

It's looking for a quotation mark (") after what we're looking for.

awk

$ echo '<e1 name="file1" id="id1" anotherId="id2">' | \
   awk '{gsub("\"","");split($2,a,"="); print a[2]}'

This variant strings the double quotes (`"``) doing a global substitution:

gsub("\"","")

The remaining string would be this:

<e1 name=file1 id=id1 anotherId=id2>

So we can let awk split this as it normally would and the 2nd column would be the bit we're interested in getting. That would be $2 to awk. So we could take that variable and then split it on equal signs (=).

split($2,a,"=");

This will split $2, and store the results in an array, a. Afterwords we can print the 2nd element in the array, this being everything on the right side of the equal sign from $2.

file1
2
  • Thank you for the effort, but the answer by @stephane-chazelas works better when the attributes are in a different order. Commented Sep 14, 2013 at 11:24
  • @phatypus - not a problem, showing other methods, they can be adapted depending on your ultimate need. Commented Sep 14, 2013 at 11:40

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.