1

I got a string which I need to separate by another string which is a substring of the original one. Let's say I got the following text:

string s = "<DOC>something here <TEXT> and some stuff here </TEXT></DOC>"

And I want to retrieve:

"and some stuff here"

I need to get the string between the "<TEXT>" and his locker "</TEXT>".

I don't manage to do so with the common split method of string even though one of the function parameters is of type string[]. What I am trying is :

Console.Write(s.Split("<TEXT>")); // Which doesn't compile

Thanks in advance for your kind help.

5
  • Is the last tag </TEXT> or </DOC>? Commented Dec 3, 2011 at 18:30
  • u are right .... doc i will edit it Commented Dec 3, 2011 at 18:31
  • Your example suggests that you are not splitting, but extracting. Commented Dec 3, 2011 at 18:32
  • Seems a lot like XML. If it is, load it up in XDocument and do a xpath select on the XML DOM. Commented Dec 3, 2011 at 18:36
  • its not XML unfortunatly Commented Dec 3, 2011 at 18:39

5 Answers 5

2
var start = s.IndexOf("<TEXT>");
var end = s.IndexOf("</TEXT>", start+1);
string res;
if (start >= 0 && end > 0) {
    res = s.Substring(start, end-start-1).Trim();
} else {
    res = "NOT FOUND";
}
Sign up to request clarification or add additional context in comments.

1 Comment

The indexOf for end should start the search from the value of start.
1

Splitting on "<TEXT>" isn't going to help you in this case anyway, since the close tag is "</TEXT>".

The most robust solution would be to parse it properly as XML. C# provides functionality for doing that. The second example at http://msdn.microsoft.com/en-us/library/cc189056%28v=vs.95%29.aspx should put you on the right track.

However, if you're just looking for a quick-and-dirty one-time solution your best bet is going to be to hand-code something, such as dasblinkenlight's solution above.

Comments

1
var output = new List<String>();
foreach (Match match in Regex.Matches(source, "<TEXT>(.*?)</TEXT>")) {
    output.Add(match.Groups[1].Value);
}

1 Comment

the output list containst nothing (thanks alot for trying to help)
1
string s = "<DOC>something here <TEXT> and some stuff here </TEXT></DOC>";
string result = Regex.Match(s, "(?<=<TEXT>).*?(?=</TEXT>)").Value;

EDIT: I am using this regex pattern (?<=prefix)find(?=suffix) which will match a position between a prefix and a suffix.

EDIT 2: Find several results:

MatchCollection matches = Regex.Matches(s, "(?<=<TEXT>).*?(?=</TEXT>)");
foreach (Match match in matches) {
    Console.WriteLine(match.Value);
}

1 Comment

i will need to get acouple of reasults is there some sort of way to get alot of reasults in this way ... my string includes alot of <TEXT></TEXT>
0

If last tag is </doc> then you could use XElement.Load to load XML and then go through it to discover wanted element (you could also use Linq To XML).

If this is not necessarily correct XML string, you could always go with Regural Expressions to find desired part of text. In this case expression should not be to hard to write it yourself.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.