divide the string with RegExp in Groovy

Question

well, i put xml-response with a lot of symbols, like this:

def xmlString = "<TAG1>1239071ABCDEFGH</TAG1><TAG2>1239071ABCDEFGH</TAG2>"

using xmlSlurper to leave only digits

def node = 
new XmlSlurper().parseText(xmlString)
    def nodelist = [node.tag1.tag2]

after this "node" got a value like "1239071123907112390711239071" and i try to put java RegExp to separate the digits by 7

System.out.println(java.util.Arrays.toString( nodelist.node.split("(?<=\G.{7})") ))

Where i did wrong? it doesn't work

How does <TAG1>1239071ABCDEFGH</TAG1><TAG2>1239071ABCDEFGH</TAG2> give you 1239071123907112390711239071? Then why are you splitting by 7 chars? — tim_yates
– tim_yates, Commented Aug 14, 2013 at 9:31
Also, XmlSlurper won't slurp that Xml as it's not valid (no root node) — tim_yates
– tim_yates, Commented Aug 14, 2013 at 9:32
Also, I believe def nodelist = [node.tag1.tag2] would return [ null ] — tim_yates
– tim_yates, Commented Aug 14, 2013 at 9:32
there's a lot of <TAG1><TAG2>...<TAGX> tags with same type of content, but i need only digits and separate it by 7 chars to get [1239071 1239071 1239071] etc. — user2652936
– user2652936, Commented Aug 14, 2013 at 9:35

tim_yates · Accepted Answer · 2013-08-14 09:51:41Z

1

Assuming you have some valid xml like:

def xmlString = """<document>
                  |    <TAG1>1239071ABCDEFGH</TAG1>
                  |    <TAG2>1239071ABCDEFGH</TAG2>
                  |</document>""".stripMargin()

Then you can get all elements starting with TAG, and for each of these trim off the end chars which aren't digits:

def nodeList = new XmlSlurper().parseText( xmlString )
                               .'**'
                               .findAll { node ->
                                   node.name().startsWith( 'TAG' )
                               }
                               .collect { node ->
                                   it.text().takeWhile { ch ->
                                       Character.isDigit( ch )
                                   }
                               }

nodeList in this example would then equal:

assert nodeList == ['1239071', '1239071']

If you want to keep these numbers associated with the TAG that contained them (assuking TAGn tags are unique), then you can change to collectEntries

def nodeList = new XmlSlurper().parseText( xmlString )
                               .'**'
                               .findAll { node ->
                                   node.name().startsWith( 'TAG' )
                               }    
                               .collectEntries { node ->
                                   [ node.name(), node.text().takeWhile { Character.isDigit( it ) } ]
                               }


assert nodeList == [TAG1:'1239071', TAG2:'1239071']

edited Aug 14, 2013 at 9:51

answered Aug 14, 2013 at 9:40

tim_yates

172k29 gold badges359 silver badges354 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2652936 Over a year ago

thank you! that's very good help! and one more please: what to do if not only TAGn unique, but values too: '1239071', '1239082', etc

tim_yates Over a year ago

The Map variant should be fine, values in a map can be unique or duplicated it doesn't matter

Collectives™ on Stack Overflow

divide the string with RegExp in Groovy

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related