I have an XML document with text in attribute values. I can't change how the the XML file is generated, but need to extract the attribute values without loosing \r\n. The XML parser of course strips them out.
So I'm trying to replace \r\n in attribute values with entity references I'm using perl to do this because of it's non-greedy matching. But I need help getting the replace to happen only within the match. Or I need an easier way to do this :)
Here's is what I have so far:
perl -i -pe 'BEGIN{undef $/;} s/m_description="(.*?)"/m_description="$1"/smg' tmp.xml
This matches what I need to work with: (.*?). But I don't know to expand that pattern to match \r\n inside it, and do the replacement in the results. If I knew how many \r\n I have I could do it, but it seems I need a variable number of capture groups or something like that? There's a lot to regex I don't understand and it seems like there should be something do do this.
Example:
preceding lines
stuff m_description="Over
any number
of lines" other stuff
more lines
Should go to:
preceding lines
stuff m_description="Over any number of lines" other stuff
more lines
Solution
Thanks to Ikegam and ysth for the solution I used, which for 5.14+ is:
perl -i -0777 -pe's/m_description="\K(.*?)(?=")/ $1 =~ s!\n! !gr =~ s!\r! !gr /sge' tmp.xml
perl -i -p0e 's/m_description="\K([^"]*)/$1=~s%\r\n% %gr/ge'(-0is roughly the same asBEGIN{undef $/}).