I have a string I want to parse that looks a bit like github markdown, but I really don't want the full implementation. The string will be a mixture of "code" blocks and "text" blocks. The code blocks will be three backticks followed by an optional "language" then some code and finally three more backticks. Non-code will be pretty much everything else. I don't (but possibly should) care if the user can't input three backticks in the "text" blocks. Here's an example ...
This is some text followed by a code block ```ruby def function "hello" end ``` Some more text
Of course there may be more code and text blocks interspersed. I've tried writing a regex for this and it seemed to work but I couldn't get the groups (in parens) to give me all of the matches and scan() loses the ordering. I've looked at using a couple of ruby parsers (treetop, parselet), but the look a bit big for what I want, but I am willing to go that route if that's my best option.
Thoughts?
A couple of people have asked for the RE I was trying (many variations of below) ...
re =
/
```\s*\w+\s* # 3 backticks followed by the language
(?!```).*? # The code everything that's not 3 backticks
``` # 3 more backticks
| # OR
(?!```).* # Some text that doesn't include 3 backticks
/x # Ignore white space in RE
It seems though that even in simple cases for example
md = /(a|b)*/.match("abaaabaa")
I'm not able to get all of the a's and b's. from say md[3] which doesn't exist. Hope that makes more sense and that's why I don't think a RE will work in my case, but I wouldn't mind being proven wrong.