0

I have a string of data which looks like this which is on one line.

<record xmlns:f="http://abc.com/">
<f:Table><f:Row><f:Cell>#1</f:Cell></f:Row><f:Row><f:Cell>Data 222</f:Cell></f:Row><f:Row>           <f:Cell>Version: v3</f:Cell></f:Row><f:Row><f:Cell>Serial Number: 000000000</f:Cell></f:Row> <f:Row><f:Cell>Signature: 123</f:Cell></f:Row><f:Row><f:Cell>Issuer:</f:Cell></f:Row><f:Row> <f:Cell>C=EE,</f:Cell></f:Row><f:Row><f:Cell>ST=ABC,</f:Cell></f:Row><f:Row><f:Cell>L=avavv,</f:Cell></f:Row><f:Row><f:Cell><f:HexDump><f:Line seq=""0x0000"" hex=""09 09 4f 3d 5a 65 72 6f 54 75 72 6e 61 72 6f 75"">..O=ABC</f:Line><f:Line seq=""0x0010"" hex=""6e 64 20 4f c3 9c 2c"">nd OÇ.,</f:Line></f:HexDump></f:Cell></f:Row><f:Row><f:Cell>OU=abc,</f:Cell></f:Row><f:Row><f:Cell>CN=trtrtrtr,</f:Cell></f:Row><f:Row><f:Cell>E=null,</f:Cell></f:Row><f:Row><f:Cell>Create: 03/03/2010 14:58</f:Cell></f:Row><f:Row><f:Cell>Expire: 04/02/2010 14:58</f:Cell></f:Row><f:Row><f:Cell>Subject:</f:Cell></f:Row><f:Row><f:Cell>C=EE,</f:Cell></f:Row><f:Row><f:Cell>ST=SS,</f:Cell></f:Row><f:Row><f:Cell>L=Tartu,</f:Cell></f:Row><f:Row><f:Cell><f:HexDump><f:Line seq=""0x0000"" hex=""09 09 4f 3d 5a 65 72 6f 54 75 72 6e 61 72 6f 75"">..O=ZeroTurnarou</f:Line><f:Line seq=""0x0010"" hex=""6e 64 20 4f c3 9c 2c"">nd OÇ.,</f:Line></f:HexDump></f:Cell></f:Row><f:Row><f:Cell>OU=KKK,</f:Cell></f:Row></f:Table>

My Ruby Regular expresion looks like this:

<f:HexDump>[\s\S]*,<\/f:Line><\/f:HexDump>

So I'm trying to remove everything (including the tags) between both of the <f:HexDump> and <f:/HexDump> tags but leaving the stuff in between.

The problem is my regex is selecting everything in between right up to the second <f:/HexDump> tag.

<f:HexDump><f:Line seq=""0x0000"" hex=""09 09 4f 3d 5a 65 72 6f 54 75 72 6e 61 72 6f 75"">..O=ABC</f:Line><f:Line seq=""0x0010"" hex=""6e 64 20 4f c3 9c 2c"">nd OÇ.,</f:Line></f:HexDump></f:Cell></f:Row><f:Row><f:Cell>OU=abc,</f:Cell></f:Row><f:Row><f:Cell>CN=trtrtrtr,</f:Cell></f:Row><f:Row><f:Cell>E=null,</f:Cell></f:Row><f:Row><f:Cell>Create: 03/03/2010 14:58</f:Cell></f:Row><f:Row><f:Cell>Expire: 04/02/2010 14:58</f:Cell></f:Row><f:Row><f:Cell>Subject:</f:Cell></f:Row><f:Row><f:Cell>C=EE,</f:Cell></f:Row><f:Row><f:Cell>ST=SS,</f:Cell></f:Row><f:Row><f:Cell>L=Tartu,</f:Cell></f:Row><f:Row><f:Cell><f:HexDump><f:Line seq=""0x0000"" hex=""09 09 4f 3d 5a 65 72 6f 54 75 72 6e 61 72 6f 75"">..O=ZeroTurnarou</f:Line><f:Line seq=""0x0010"" hex=""6e 64 20 4f c3 9c 2c"">nd OÇ.,</f:Line></f:HexDump>

Can this be done using Ruby regular expressions?

5
  • 6
    That's XML. You should parse XML as XML using an XML parser, not as text using a regular expression. Especially when you want to match nested things. Commented Sep 13, 2012 at 0:53
  • Yeah I know, I tried nokogiri but I've not had any success with getting to elements I need to remove. Probably because its badly formed. Commented Sep 13, 2012 at 0:56
  • You mean the ""? Luckily, that can be fixed with a string replacement :) (If it's not that, then what errors does nokogiri give you?) Commented Sep 13, 2012 at 0:58
  • I don't really get an error as such. If I use this code: doc = Nokogiri::XML(doc2) doc.remove_namespaces! node_values = doc.search('//HexDump/*') do |n| n.text end puts node_values I can't get it to return anything. I did a string replacement and removed all of the "". Commented Sep 13, 2012 at 1:12
  • If your XML is malformed you're almost certainly more likely to get problems parsing it with regex than a proper XML parser. Malformed content is the #1 reason to use a parser instead of regex. Commented Sep 13, 2012 at 1:30

1 Answer 1

1

You can use the regex:

/<f:HexDump>.*?<\/f:HexDump>/

The key here is making the part between the HexDump tags non-greedy by using the ?.

Assuming your string is stored in str, you can get rid of all the HexDump tags doing:

str.gsub(/<f:HexDump>.*?<\/f:HexDump>/, '')

Note that you might want to enable multi-line mode for the regex (adding m to the end) if there can be return carriages between the tags.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.