1

I have this

var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");

So this gets me values until it hits a W in the string correct? - I need it to stop at a W OR S. I have tried a few different ways but I am not getting it to work. Anyone got some info?

More info:

            record = record.Replace(" ", "").Replace("\r\n", "").Replace("-", "/");
            var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
            string strStartDate = regex.Match(record).Groups[1].ToString();
            string strEndDate = regex.Match(record).Groups[2].ToString();
            string Status = regex.Match(record).Groups[3].ToString().ToUpper().StartsWith("In") ? "Inactive" : "Active";

I am trying to parse a big string of values, I only want 3 things - Start Date, End Date, and Status (active/inactive). However there are 3 different values for each (3 start dates, 3 end dates, 3 status')

First 2 string go like this

"Start Date: 

 2014-09-08 



End Date: 

 2017-09-07 



Warranty Type: 

 XXX 



Status: 

 Active 



Serial Number/IMEI: 

 XXXXXXXXXXX









Description:



XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

The 3rd string is like this

"Start Date: 

 2014-09-08 



End Date: 

 2017-09-07 



Status: 

 Active 



Warranty Upgrade Code:



SVC_PRIORITY"

On the last string it will not display the dates because of the W.* after end date im guessing

I am not getting the 2 dates on the last string

17
  • What is/are the input string(s)? Commented Mar 7, 2016 at 14:17
  • updated my original post @WiktorStribiżew Commented Mar 7, 2016 at 14:21
  • if you need it to end at W or S then you'd do something like (.*)[WS] Commented Mar 7, 2016 at 14:22
  • @BugFinder I am not getting the dates if I do it like this: var regex = new Regex(@"StartDate:(.*)EndDate:(.*)[WS]Status:(.*)"); Commented Mar 7, 2016 at 14:24
  • 1
    Why not split with newlines first, and then split with :+space? Commented Mar 7, 2016 at 14:28

4 Answers 4

1

EDIT Please try the function to parse using regex:

using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;

private static List<string[]> parseString(string input)
{
    var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
    return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });

}

// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);

Output:

enter image description here

EDIT2 For OP's situation, you could call the function from inside the foreach loop like this:

foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
    if (el.GetAttribute("className") == "fluid-row Borderfluid")
    {
        string record = el.InnerText;
        //if record is the string to parse
        var result = parseString(record);
        var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
        MessageBox.Show(result_string);
    }
}
Sign up to request clarification or add additional context in comments.

9 Comments

it wont let me MessageBox.Show(result) - cannot convert List.<<string>> to string is the error. Is there any other way to display the results other than MsgBox.Show
Once you've got List<string[]> as result, you could do what you like with it: Join them as string; use Messagebox.Show() to display it; or Console.WriteLine() to output to your console; or further process the data. To use MessageBox.Show(), you have to convert the result to a string. Did you do that?
I am not sure I can use a function in my situation due to this code is within a foreach loop - would it still be alright?
If you prefer not to use a function, just take the code inside { }. Remember to replace the input variable with your string variable, and assign the result to another variable. It should be alright
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div")) if (el.GetAttribute("className") == "fluid-row Borderfluid") { string record = el.InnerText;} -- This is what im putting the code inside, I am not sure if I can add a function within this? I just want the values of those items
|
1

No need to replace the new lines in your example

List<string> resultList = new List<string>();

var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";

Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Groups[1].Value);
    resultList.Add(matchResult.Groups[2].Value);
    resultList.Add(matchResult.Groups[4].Value);
    matchResult = matchResult.NextMatch();
} 

2 Comments

Is resultList an array ? should I create an ArrayList ?
You can make it a List or something else that supports the Add. My assumption here is that the subject string can contain multiple occurrences of the thing you want to match. I updated the answer.
0

You may replace your code with the following one (see IDEONE demo):

var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ")            // Remove excessive whitespace
        .Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
        .ToDictionary(n => n[0], n => n[1]);              // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
    Console.WriteLine(res["Start Date"]);
    strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
    Console.WriteLine(res["Warranty Type"]);
    Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
    Console.WriteLine(res["End Date"]);
    strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
    Console.WriteLine(res["Status"]);
    string Status = res["Status"];
}

Note that the best approach is to declare your own class with the fields like WarrantyType, StartDate, etc. and initialize that right in the LINQ code.

2 Comments

hmm why is it important to declare the classes? - My next step with this is to convert the dates to DateTime and input them into a database table - Would this be helpful for that next step?
No, it is not important, but it could be useful. It is up to you what you do with this further. Also, you can .Trim() the values when assigning to the variables. E.g. Status = res["Status"].Trim() (since the trailing whitespace is still there).
0

Avoid .* its a catch all which gets regex pattern creators in trouble. Instead create the pattern to match to a specific pattern in the data which always occurs in the data.

Your pattern are the two dates of \d\d\d\d-\d\d-\d\d\d\d the rest is anchor text, which should be used as static anchors which can be skipped.

Here is an example where it looks for the date patterns. Once found regex puts it into named match capture groups (?<GroupNameHere>...) and Linq extracts each match into a dynamic entity and parses the date times.

Data

Note the first date is reversed as per your example

var data = @"Start Date:

 2014-09-08

End Date:

 2017-09-07

Status:

 Active

Start Date:

 2014-09-09

End Date:

 2017-09-10

Status:

 In-Active
 ";

Pattern

string pattern = @"
^Start\sDate:\s+                     # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d)         # actual start date pattern
\s+                                  # a lot of space including \r\n
^End\sDate:\s+                       # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d)           # pattern of the end date.
\s+                                  # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
 ";

Processing

// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
                             RegexOptions.Multiline       |
                             RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select (mt => new
            {
                Status    = mt.Groups["Status"].Value,
                StartDate = DateTime.Parse(mt.Groups["Start"].Value),
                EndDate   = DateTime.Parse(mt.Groups["End"].Value)
            })

Result

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.