5

I am hopeless with regex (c#) so I would appreciate some help:

Basicaly I need to parse a text and I need to find the following information inside the text:

Sample text:

KeywordB:***TextToFind* the rest is not relevant but **KeywordB: Text ToFindB and then some more text.

I need to find the word(s) after a certain keyword which may end with a “:”.

[UPDATE]

Thanks Andrew and Alan: Sorry for reopening the question but there is quite an important thing missing in that regex. As I wrote in my last comment, Is it possible to have a variable (how many words to look for, depending on the keyword) as part of the regex?

Or: I could have a different regex for each keyword (will only be a hand full). But still don't know how to have the "words to look for" constant inside the regex

1
  • Regular expression syntax is slightly different if you're using a Linux oriented technology or a Microsoft oriented technology so you might want to tag which one you're working with. Commented Jan 18, 2009 at 1:03

3 Answers 3

6

The basic regex is this:

var pattern = @"KeywordB:\s*(\w*)";
    \s* = any number of spaces
    \w* = 0 or more word characters (non-space, basically)
    ()  = make a group, so you can extract the part that matched

var pattern = @"KeywordB:\s*(\w*)";
var test = @"KeywordB: TextToFind";
var match = Regex.Match(test, pattern);
if (match.Success) {
    Console.Write("Value found = {0}", match.Groups[1]);
}

If you have more than one of these on a line, you can use this:

var test = @"KeywordB: TextToFind KeyWordF: MoreText";
var matches = Regex.Matches(test, @"(?:\s*(?<key>\w*):\s?(?<value>\w*))");
foreach (Match f in matches ) {
    Console.WriteLine("Keyword '{0}' = '{1}'", f.Groups["key"], f.Groups["value"]);
}

Also, check out the regex designer here: http://www.radsoftware.com.au/. It is free, and I use it constantly. It works great to prototype expressions. You need to rearrange the UI for basic work, but after that it's easy.

(fyi) The "@" before strings means that \ no longer means something special, so you can type @"c:\fun.txt" instead of "c:\fun.txt"

Sign up to request clarification or add additional context in comments.

8 Comments

Just one more thing: in some cases the value can be 2 words rather then one word. Any Suggestions?
How is the regex supposed to know it should match two words instead of one?
@Andrew, do you realize almost every thing in that regex is optional? It could legally match just a colon. You should replace \w* with \w+. Also, I don't see any need to enclose the whole thing in parens, nor for that \s* at the beginning.
@Alan So there is no way to tell regex not just to "get" the first but also 2 second wird which are both seperated by a space?
Yes, it could be more complete, more robust, etc, but this wasn't quite production code :) I'll update it. Also, the only real good way to match more than one word is to ensure that the ":" is right after the keyword
|
5

Let me know if I should delete the old post, but perhaps someone wants to read it.

The way to do a "words to look for" inside the regex is like this:

regex = @"(Key1|Key2|Key3|LastName|FirstName|Etc):"

What you are doing probably isn't worth the effort in a regex, though it can probably be done the way you want (still not 100% clear on requirements, though). It involves looking ahead to the next match, and stopping at that point.

Here is a re-write as a regex + regular functional code that should do the trick. It doesn't care about spaces, so if you ask for "Key2" like below, it will separate it from the value.

string[] keys = {"Key1", "Key2", "Key3"};
string source = "Key1:Value1Key2: ValueAnd A: To Test Key3:   Something";
FindKeys(keys, source);

private void FindKeys(IEnumerable<string> keywords, string source) {
    var found = new Dictionary<string, string>(10);
    var keys = string.Join("|", keywords.ToArray());
    var matches = Regex.Matches(source, @"(?<key>" + keys + "):",
                          RegexOptions.IgnoreCase);            

    foreach (Match m in matches) {
        var key = m.Groups["key"].ToString();
        var start = m.Index + m.Length;
        var nx = m.NextMatch();
        var end = (nx.Success ? nx.Index : source.Length);
        found.Add(key, source.Substring(start, end - start));
    }

    foreach (var n in found) {
        Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
    }                            
}

And the output from this is:

Key=Key1, Value=Value1
Key=Key2, Value= ValueAnd A: To Test 
Key=Key3, Value=   Something

4 Comments

@Andrew, nice one! The "Key2: ValueAnd A:" Space in the Value was exactly what the problem was. thanks!
Glad I could help. I'm still trying to figure out a good clean regex way to do this, maybe with a simple loop, but i can only get 70% "rightness" so far.
Looking forward to the 100% :-)
thanks for the nice answer, almost what Im looking for, however how to modify the solution to return 1.1, 1.2, 1.3 if the original string is: ''Key1:(1.1)Key2:(1.2)And A: To Test Key3:(1.3)blahblahblah'' . After applying this solution on my string, I grab each and split the values based on the parentheses. But isn't there a nicer pure regex solution?
0
/KeywordB\: (\w)/

This matches any word that comes after your keyword. As you didn´t mentioned any terminator, I assumed that you wanted only the word next to the keyword.

3 Comments

That doesn't appear to be a C# regular expression, but rather suspiciously like perl.
@Andrew, do you mean because it's enlosed in slashes? That's no big deal; just replace them with quotes. There's nothing in the regex itself that would cause C# to barf.
@Tiago, the real problem is that \w only matches one character; you should change it to \w+. Also, I believe Flo only used "KeywordB" as an example, and you should replace that with \w+ as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.