For some reason I cannot use Sax and DOM parsers and need to parse it with regex.
I want to extract the values in Key-value pairs(Key being content in tag1, value being content in tag 3) . but some of the keys don't have any key values in between, I have to ignore those keys.
XML file
<Main Tag><element><tag1>Key1</tag1><tag2>Not intrested</tag2><tag3>Value1</tag3></element><element><tag1>Key2</tag1><tag2>Not intrested</tag2></element><element><tag1>Key3</tag1><tag2>Not intrested</tag2><tag3>Value3</tag3></element></Main Tag>
The above xml file with indentation:
<Main Tag>
<element>
<tag1>Key1</tag1>
<tag2>Not intrested</tag2>
<tag3>Value1</tag3>
</element>
<element>
<tag1>Key2</tag1>
<tag2>Not intrested</tag2>
</element>
<element>
<tag1>Key3</tag1>
<tag2>Not intrested</tag2>
<tag3>Value3</tag3>
</element>
</Main Tag>
So from above file I need to extract Key1-Value1 and Key3-Value3, Ignoring Key2 because it doesn't have a value.
Using the matcher:
final Pattern pattern = Pattern.compile("<tag1>(.+?)</tag1>.*<tag3>(.+?)</tag3>");
final Matcher matcher = pattern.matcher(above string);
matcher.find();
System.out.println(matcher.group(1)); // gives Key1
System.out.println(matcher.group(1)); // gives Value3 // instead of Value1
<Main Tag>is not allowed. Valid XML tag names can not contain whitespace and attributes always use the long form (unlike HTML). In other words, the example source you posted, is not XML.