1

I have a file that is just one line (one HUGE line) to parse. I want to parse out the value that appears between "Undefined error code" and " id" on this line. The thing is this appears multiple times on the same line with different values everywhere. The following code only gives me the last instance.

cat bad_events_P2J3.xml | sed -n 's/.*Undefined error code (\(.*\))\" id.*/\1\n/p'

How can I get all instances of this?

1
  • Why not replace "id." with "\n"? Then each record is on a line. Commented Jul 30, 2010 at 1:32

2 Answers 2

1

You were on the right track:

sed -n 's/.*Undefined error code\(.*\)id.*/\1/p' bad_events_P2J3.xml

Note that cat is unnecessary and, unless you need an extra newline, sed will provide one for you.

I missed the fact that this appears multiple times in your file. This should work in that case:

grep -Po 'Undefined error code.*?id' bad_events_P2J3.xml | sed 's/^Undefined error code//;s/id$//'
Sign up to request clarification or add additional context in comments.

4 Comments

so the cat caused the problem? I thought I should be able to do it with sed. I just wasn't seeing the wood for the trees. Thanks
@amadain: No, cat wasn't the problem. It just wasn't necessary since sed accepts a filename as an argument and you're not conCATenating multiple files. The problem was probably the extra set of parentheses. Without seeing a portion of the actual data, it's hard to be sure.
@OP, this works only is you are sure you have 1 instance of those pair of words.It will only get the last instance is there are more because sed is greedy.
thank you for this. Actually the double parenthesis are needed as the phrase that appears multiple times is actually "Undefined error code(code_here)" id="code_here" so the instance of the number I was matching was in parenthesis. The grep -Po was what I actually needed. The first solution had the same problem as mine i.e. it only displayed one instance - the last instance
1
$ cat file
text1 text2 Undefined error code text3 text4 id text5 text6 Undefined error code txt7 txt8 id
$ awk -vRS="id" '{gsub(/.*Undefined error code/,"")}1' file
 text3 text4
 txt7 txt8

1 Comment

wonderful. I couldn't find how to do this anywhere. It seems like it would be an easy thing that you could do with simple sed but its a lot harder when you tackle it. This code was perfect. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.