Improve sed command to replace first instance of character and all following characters?

Question

Input:

<e1 name="file1" id="id1" anotherId="id2">

Desired output:

file1

I can get what I need with this:

echo '<e1 name="file1" id="id1" anotherId="id2">' | sed 's/\(.*name="\)\(.*\)\(".*\)/\2/' | sed 's/".*//'

Output: file1

I would like to improve the set of commands and remove the last pipe to sed if possible. If I remove the last pipe to sed, I cannot get what I want:

echo '<e1 name="file1" id="id1" anotherId="id2">' | sed 's/\(.*name="\)\(.*\)\(".*\)/\2/'

Output:

file1" id="id1" anotherId="id2

As you can see sed is picking up the last quotation mark and not the first after file1.

Can someone help improve this command?

I should have added that attribute order may vary.

phatypus
– phatypus

2013-09-14 11:26:23 +00:00
Commented Sep 14, 2013 at 11:26 — phatypus
– phatypus, Commented Sep 14, 2013 at 11:26

Stéphane Chazelas · Accepted Answer · 2013-09-14 07:55:27Z

2

echo '<e1 name="file1" id="id1" anotherId="id2">' |
  sed -n 's/.*name="\([^"]*\)".*/\1/p'

Or with GNU grep if built with PCRE support:

echo '<e1 name="file1" id="id1" anotherId="id2">' |
  grep -Po 'name="\K[^"]*'

edited Sep 14, 2013 at 7:55

answered Sep 14, 2013 at 6:54

Stéphane Chazelas

587k96 gold badges1.1k silver badges1.7k bronze badges

Thank you; the sed example works perfectly even when the attributes are in a different order.

phatypus
– phatypus

2013-09-14 11:25:31 +00:00
Commented Sep 14, 2013 at 11:25

Add a comment |

slm · Accepted Answer · 2013-09-14 12:19:27Z

sed

You could simplify it a bit with this version:

$ echo '<e1 name="file1" id="id1" anotherId="id2">' | \
   sed 's/.*name="\(.*\)" id.*/\1/'

You don't need to wrap everything with parens, only the things you're interested in saving for use later on, so you can remove.

grep

You can also use grep's ability to use Perl's regular expression engine (PCRE):

$ echo '<e1 name="file1" id="id1" anotherId="id2">' | \
   grep -Po '(?<=name=")(\w+)(?=")'

This use's PCRE's ability to look ahead and lookbehind. The notation looks for a sequence of characters such as "name=" before what we're looking for. This bit is doing this:

(?<=name=")

It then looks for a series of word characters, this is what we're actually looking for:

(\w+)

The last bit that's doing the lookahead is this:

(?=")

It's looking for a quotation mark (") after what we're looking for.

awk

$ echo '<e1 name="file1" id="id1" anotherId="id2">' | \
   awk '{gsub("\"","");split($2,a,"="); print a[2]}'

This variant strings the double quotes (`"``) doing a global substitution:

gsub("\"","")

The remaining string would be this:

<e1 name=file1 id=id1 anotherId=id2>

So we can let awk split this as it normally would and the 2nd column would be the bit we're interested in getting. That would be $2 to awk. So we could take that variable and then split it on equal signs (=).

split($2,a,"=");

This will split $2, and store the results in an array, a. Afterwords we can print the 2nd element in the array, this being everything on the right side of the equal sign from $2.

file1

Thank you for the effort, but the answer by @stephane-chazelas works better when the attributes are in a different order. — phatypus
– phatypus, Commented Sep 14, 2013 at 11:24
@phatypus - not a problem, showing other methods, they can be adapted depending on your ultimate need. — slm
– slm ♦, Commented Sep 14, 2013 at 11:40

Stack Exchange Network

Improve sed command to replace first instance of character and all following characters?

2 Answers 2

sed

grep

awk

You must log in to answer this question.

Hot Network Questions

Improve sed command to replace first instance of character and all following characters?

2 Answers 2

sed

grep

awk

You must log in to answer this question.

Related

Hot Network Questions