Ruby Regex matching string before and after certain characters

Question

I've got a string like this:

&lt;block trace="true" name="AssignResources: Append Resources"&gt;

I need to get the word (or the characters to next whitespace) after < (in this case block) and the words before = (here trace and name).

I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block.

I'm sure it's not that hard, but I've not found the solution yet.

Anybody's got a hint?
Thanks.

Btw: I want to replace the pattern matches with gsub.

EDIT:

Solved it with following regexes:

1) /\s(\w+)="(.*?)"/ matches all attr and their values in $1 and $2.

2) // matches comments

3) /<([\/|!|\?]?)([A-Za-z0-9]+)[^\s|>|\/]*/ matches all tag names, wheter they're in a closing tag, self closing tag, <?xml>-tag or DTD-tag. $1 includes optional prefixed / ! or ? or nothing and $2 contains the tagname

Community · Accepted Answer · 2017-05-23 11:48:42Z

2

Its looks so much like parsing HTML with regex to me

Ruby has very good html parser called Nokogiri

And Here is howto for that

require 'nokogiri'

html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')

html.xpath("//*").each do |s|
    puts s.node_name #block
    puts s.keys #trace, name
    puts s.values #true, AssignResources: Append Resources
end

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

answered Feb 24, 2010 at 15:19

YOU

124k34 gold badges192 silver badges222 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Sebastian Over a year ago

Hey S.Mark, I already use Nokogiri for that (XML Parsing) and it's great. i will think think about my application flow again - maybe i can do that replacement earlier and with nokogiri. At the time I do that replacement, it's no XML anymore. it's converted to one huge string. that's necessary because it shall be presented as text with having the values of former xml-tag attributes being then html <a>-tags linking to other html pages, defined by the value of the attribute. the replacements via gsub and pattern matching is done to surround parts of a xml tag with different <span>-tags.

Sebastian Over a year ago

And no: doing the syntax highlighting via javascript is no solution in this case. At this moment I've got "prettify" in use. but having documents with more than 2 thousand lines and x times more tags, it's no fun to use. that's why i want to prepare the output already in my parsing app.

YOU Over a year ago

syntax highlighting? have you considered using existing library like shjs? shjs.sourceforge.net

Sebastian Over a year ago

yes, I tried it, as I said , using Prettify (code.google.com/p/google-code-prettify). I think the problems are the same: having huge contents to highlight, the site is not usable anymore (30+secs). huge content => 7000+ lines of xml sometimes weird requirements ask for weird solutions ;)

YOU Over a year ago

I think regex can't be fast for 7000+ lines of data though.

|

codaddict · Accepted Answer · 2010-02-24 14:43:09Z

1

You can try:

&lt;([^ ]*)\s([^=]*)=

answered Feb 24, 2010 at 14:43

codaddict

457k83 gold badges501 silver badges537 bronze badges

Comments

sepp2k · Accepted Answer · 2010-02-24 14:58:52Z

0

'&lt;block trace="true" name="AssignResources: Append Resources"&gt;'[/&lt;(\w+)/, 1]
#=> "block"

If you pass a regex and an index i to String#[], it'll return the value of the ith capturing group.

Edit:

In 1.9 you can use /(?<=<)\w+/ to require the presence of the < without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:

"lo&lt;la li".gsub(/(&lt;)(\w+)/, '\1 --\2--')
 #=> "lo&lt; --la-- li"

edited Feb 24, 2010 at 14:58

answered Feb 24, 2010 at 14:43

sepp2k

372k56 gold badges687 silver badges687 bronze badges

1 Comment

Sebastian Over a year ago

Thanks for that hint, but I need the regex pattern as parameter to gsub method, to replace all these pattern matches with another string. I'm thinking about how to make it fit to gsub.

Amarghosh · Accepted Answer · 2010-02-24 15:32:02Z

0

&lt;block trace="true" name="AssignResources: Append Resources"&gt;

&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;

#result:

$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources

Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.

str = '&lt;block trace="true" name="AssignResources: Append Resources"&gt;'
repl = str.gsub(/&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;/, 
    "tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl

edited Feb 24, 2010 at 15:32

answered Feb 24, 2010 at 14:49

Amarghosh

59.5k11 gold badges95 silver badges122 bronze badges

1 Comment

Sebastian Over a year ago

Thanks Amarghosh, very nice solution, but I forgot to mention, that I need it as pattern parameter for gsub... But thx anyway.

Jonas Elfström · Accepted Answer · 2010-02-24 15:47:24Z

0

Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:

>> m,r=0,["&lt;blockie ", " tracie=", " namie="]
>> s.gsub(/&lt;.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "&lt;blockie tracie="true" namie="AssignResources: Append Resources"&gt;"

answered Feb 24, 2010 at 15:47

Jonas Elfström

31.5k6 gold badges74 silver badges107 bronze badges

Collectives™ on Stack Overflow

Ruby Regex matching string before and after certain characters

5 Answers 5

9 Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

9 Comments

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related