I am parsing some text with Python and am running into an odd issue...
an example text that is being parsed:
msg:"ET WEB_SPECIFIC_APPS ClarkConnect Linux proxy.php XSS Attempt"; flow:established,to_server; content:"GET"; content:"script"; nocase; content:"/proxy.php?"; nocase; content:"url="; nocase; pcre:"//proxy.php(?|.[\x26\x3B])url=[^&;\x0D\x0A][<>"']/i"; reference:url,www.securityfocus.com/bid/37446/info; reference:url,doc.emergingthreats.net/2010602; classtype:web-application-attack; sid:2010602; rev:4; metadata:created_at 2010_07_30, updated_at 2010_07_30;
my regex:
msgSearch = re.search(r'msg:"(.+)";",line)
actual result:
ET WEB_SPECIFIC_APPS ClarkConnect Linux proxy.php XSS Attempt"; flow:established,to_server; content:"GET"; content:"script"; nocase; content:"/proxy.php?"; nocase; content:"url="; nocase; pcre:"//proxy.php(?|.[\x26\x3B])url=[^&;\x0D\x0A][<>"']/i
expected result:
ET WEB_SPECIFIC_APPS ClarkConnect Linux proxy.php XSS Attempt
There are 10s of thousands of lines of text that I am parsing that are all giving me similar results. Any reason regex is picking a (seemingly) random "; to stop at? I can fix the example above by making the regex more specific, eg. r'msg:"([\w\s\.]+)";" but other lines have different characters included. I guess I could just include every special character in my regex, but I'm trying to understand why my wildcard isn't working properly.
Any help would be appreciated!
\bmsg:\"([^\"]+)\"regex Online demo is regex101.com/r/U6ASno/1