2

I am new to to regex stuff in c#. I read whatever I could get my hands on and tried to come up with a regex for extracting the date time value from my log. This is what I am using:

value = Regex.Match("abc 2012‎-‎12‎-‎23 01:13:51.253", 
                   @"\b20[0-9][0-9]‎-[0-1][0-9]‎-‎[0-3][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9].\d+")
             .Value;

But everytime I am getting "" in value. Can someone please help me as to what am I doing wrong?

Thanks in advance.

3
  • 2
    Are you trying to match a date string? Why not use DateTime.ParseExact? Commented Apr 25, 2013 at 15:20
  • yes.. I am trying to do that. But the problem is it is not well defined as to after how many character in each line the date time would come. So cant give a defined input to DateTime.ParseExact Commented Apr 25, 2013 at 15:38
  • See my answer below. I've illustrated how to use the format strings. Commented Apr 25, 2013 at 15:40

3 Answers 3

5

The problem is very subtle. You have a hidden control character in your search string. Just before and after each hyphen there is a 0x200e or LEFT-TO-RIGHT characters. I confirmed this by copying your code and inspecting the bytes. You can also test this by placing your cursor before the - and pressing backspace.

Your pattern string also contains these hidden control characters too, before the first hyphen, and before and after the second hyphen.

Once I removed all instances of this character from the search string and the pattern string, the pattern matched correctly.

You're best bet is to strip these characters out of your input before you try to do anything else. This applies if you are doing RegEx or conventional DateTime parsing as others have suggested. This is the easiest way to remove those characters from your string:

input = input.Replace(char.ConvertFromUtf32(0x202e).ToString(), string.Empty);
Sign up to request clarification or add additional context in comments.

3 Comments

thanks p.s.w.g. This looks like to be the issue. In such a situation how can I generalize the regex to avoid such characters? Or should I follow some other approach?
Well you probably want to filter out those characters from your input before hand (see my updated answer) or carefully construct your pattern to expect them. This applies regardless of how you choose to parse the date.
@AlexG I first suspected OP was using miss-matched dash characters (e.g. hyphen-minus v.s. figure-dash) -- I've seen that before, and it's almost impossible to notice in a monospaced font. I didn't see the left-to-right mark until I inspected the bytes.
0

If you're just wanting to learn regexes, then you can use:

@"(^((((0[1-9])|([1-2][0-9])|(3[0-1]))|([1-9]))\x2F(((0[1-9])|(1[0-2]))|([1-9]))\x2F(([0-9]{2})|(((19)|([2]([0]{1})))([0-9]{2}))))$)"

If it's a valid use case for an application, then you should be parsing it into a DateTime object, using something like DateTime.ParseExact.

Comments

0

Here's how you can match the string using DateTime.ParseExact:

string dateString = "2012-12-23 01:13:51.253";
string format = "yyyy-MM-dd hh:mm:ss.fff";

DateTime dateTime = DateTime.ParseExact(dateString, format, CultureInfo.InvariantCulture);

I'm not sure what the 'abc' part of your string is, but if that's the three letter abbreviation of a month, you can change your format string to this:

string format = "MMM yyyy-MM-dd hh:mm:ss.fff";

Here's the complete list of custom format codes for date time parsing: http://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.