parsing multiple values from a file

Question

I have a file that is just one line (one HUGE line) to parse. I want to parse out the value that appears between "Undefined error code" and " id" on this line. The thing is this appears multiple times on the same line with different values everywhere. The following code only gives me the last instance.

cat bad_events_P2J3.xml | sed -n 's/.*Undefined error code (\(.*\))\" id.*/\1\n/p'

How can I get all instances of this?

Why not replace "id." with "\n"? Then each record is on a line. — Demosthenex
– Demosthenex, Commented Jul 30, 2010 at 1:32

Dennis Williamson · Accepted Answer · 2010-07-30 01:20:21Z

1

You were on the right track:

sed -n 's/.*Undefined error code\(.*\)id.*/\1/p' bad_events_P2J3.xml

Note that cat is unnecessary and, unless you need an extra newline, sed will provide one for you.

I missed the fact that this appears multiple times in your file. This should work in that case:

grep -Po 'Undefined error code.*?id' bad_events_P2J3.xml | sed 's/^Undefined error code//;s/id$//'

edited Jul 30, 2010 at 1:20

answered Jul 29, 2010 at 15:22

Dennis Williamson

364k95 gold badges386 silver badges446 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

amadain Over a year ago

so the cat caused the problem? I thought I should be able to do it with sed. I just wasn't seeing the wood for the trees. Thanks

Dennis Williamson Over a year ago

@amadain: No, cat wasn't the problem. It just wasn't necessary since sed accepts a filename as an argument and you're not conCATenating multiple files. The problem was probably the extra set of parentheses. Without seeing a portion of the actual data, it's hard to be sure.

ghostdog74 Over a year ago

@OP, this works only is you are sure you have 1 instance of those pair of words.It will only get the last instance is there are more because sed is greedy.

amadain Over a year ago

thank you for this. Actually the double parenthesis are needed as the phrase that appears multiple times is actually "Undefined error code(code_here)" id="code_here" so the instance of the number I was matching was in parenthesis. The grep -Po was what I actually needed. The first solution had the same problem as mine i.e. it only displayed one instance - the last instance

ghostdog74 · Accepted Answer · 2010-07-29 13:56:32Z

1

$ cat file
text1 text2 Undefined error code text3 text4 id text5 text6 Undefined error code txt7 txt8 id
$ awk -vRS="id" '{gsub(/.*Undefined error code/,"")}1' file
 text3 text4
 txt7 txt8

answered Jul 29, 2010 at 13:56

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

1 Comment

amadain Over a year ago

wonderful. I couldn't find how to do this anywhere. It seems like it would be an easy thing that you could do with simple sed but its a lot harder when you tackle it. This code was perfect. Thanks

Collectives™ on Stack Overflow

parsing multiple values from a file

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related