0

Having some trouble with regexp. My XML file loaded to actionscript removes all spaces (automatically trims the text). So I want to replace all SPACE with a word so that I can fix that later on in my own parsing.

Here's examples of how the tags I want to adjust.

<w:t> </w:t>
<w:t> Test</w:t>
<w:t>Test </w:t>

This is the result I want.

<w:t>%SPACE%</w:t>
<w:t>%SPACE%Test</w:t>
<w:t>Test%SPACE%</w:t>

The closest result I got is <w:t>\s|\s</w:t>

Biggest problem is that it changes all spaces in the XML file that corrupts everything. Will only change inside w:t nodes but not destroy the text.

1
  • How about <w:t> Test Test </w:t>? Should all three spaces be replaced? Commented Feb 3, 2011 at 16:18

4 Answers 4

1

When parsing XML using the standard XML class in ActionScript you can specify to not ignore whitespace by setting the ignoreWhiteSpace property to false. It is set to true by default. This will ensure that white space in XML text nodes is preserved. You can then do whatever you want with it.

XML.ignoreWhiteSpace = false
/* parse your XML here */

That way you don't have to muck around with regular expressions and can use the standard XML ActionScript parsing.

Sign up to request clarification or add additional context in comments.

Comments

1
var reg1 : RegExp = /((?:<w:t>|\G)[^<\s]*+)\s/g;
data = data.replace(reg1, "$1%SPACE%");

(?:<w:t>|\G) means every match starts at a <w:t> tag, or immediately after the previous match. Since [^<\s] can't match the closing </w:t> tag (or any other tag), every match is guaranteed to be inside a <w:t> element.

To do this properly, you would need to deal with some more questions, like:

  • \s matches several other kinds of whitespace, not just ' '. Do you want to replace any whitespace character with %SPACE%? Or do you know that ' ' will be the only kind of whitespace in those elements?

  • Will there be other elements inside the <w:t> elements (for example, <w:t> test <xyz> test </xyz> </w:t>)? If so, the regex becomes more complicated, but it's still doable.

I'm not set up to test ActionScript, but here's a demo in PHP, which uses the PCRE library under the hood, like AS3:
test it on ideone.com

EDIT: In addition to matching where the last match left off, \G matches the beginning of the input, just like \A. That's not a problem with the regex given here, but in the ideone demo it is. That regex should be

((?:<w:t>|\G(?!\A))(?:[^<\s]++|<(?!/w:t>))*+)\s

Comments

0

Made a workaround that isn't so nice. But well, problem is when you work against the clock.

I run the replace 3 times instead.

var reg1 : RegExp = /<w:t>\s/gm;
data = data.replace(reg1, "<w:t>%DEADSPACE%");

var reg2 :RegExp = /\s<\/w:t>/gm;
data = data.replace(reg2, "%DEADSPACE%</w:t>");

var reg3 :RegExp = /<w:t>\s<\/w:t>/gm;
data = data.replace(reg3, "<w:t>%DEADSPACE%</w:t>");

RegExp, what is it good for. Absolutly nothing (singing) ;)

1 Comment

Regex is good for quite a lot if you use it properly. It just takes practice. (I'm not trying to come off as condescending, just stating a fact. Took me months to really get good at them!)
0

there's also another way

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.