0

I'm currently struggling to parse an C Output.map File using regex. I'm treating each line separately: A single line could look like this

__func_name     |00010d88|   T  |              FUNC|00000010|     |.text

Expected Output:
1) "__func_name"
2) "00010d88"
3) "T"
4) "FUNC"
5) "00000010"
6) (empty string)
7) ".text"
8) (empty string)

However the number of white spaces between the texts varies: Another line could Look like this:

__func_name2|0007bb7c|   T  |              FUNC|00000034|     |.text    sourcefile.c:49

1) "__func_name2" 2) "0007bb7c" 3) "T" 4) "FUNC" 5) "00000034" 6) (empty string)

7) ".text"

8) "sourcefile.c:49"

As you can see not only the number of white spaces varies, but there is also the source file listed. Now i did try to solve this Problem using the regexr. I basically need the following requirements for my regex

  1. Alphanumeric string

  2. A (hex)Number

  3. A single letter

  4. A String

  5. A (hex)number

  6. An optional string

  7. Another optional string

Each Group is separated by a | character. I tried this regex. Although incomplete, regexr tells me that I'm only matching the first group.

Could you help me to figure out what's wrong with my regex?

([__A-Za-z0-9])\w+|((([\|]{1})&[0-9a-h]&([\|]{1})))\w+|([A-Z])\w+

You can try a live demo here: https://regexr.com/4gpvf

Edit: Expected Outputs added

5
  • 10
    It seems rather obvious that | is being used as a delimiter. Wouldn't it be far simpler to split by that, then trim each resulting string? The last segment would be .text sourcefile.c:49, and that can be easily parsed with a much simpler regex. Commented Jul 2, 2019 at 14:22
  • 1
    What output would you expect in your second example - would you expect the source file to be part of the final string, two separate strings or the source file omitted? Commented Jul 2, 2019 at 14:25
  • Do you mean like this? regex101.com/r/BFDygW/1 Commented Jul 2, 2019 at 14:34
  • hm split is a good idea. Do you mean like this? string[] single_element = single_line.Split((char)('|')); ? Commented Jul 2, 2019 at 14:40
  • 3
    Just single_line.Split('|'). I would not remove empty columns if you want to preserve the column indices. Commented Jul 2, 2019 at 14:43

3 Answers 3

1

A rather simple match pattern could be this:

@"\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S)\s*\|\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S*)\s*\|\s*(\S*)\s*(\S*).*"

Executed this way:

  string[] data = 
  {
    "__func_name   | 00010d88 | T | FUNC | 00000010 |     |.text",
    "__func_name2 | 0007bb7c | T | FUNC | 00000034 |     |.text    sourcefile.c:49"
  };

  var matchess = data.Select(s => Regex.Matches(s, @"\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S)\s*\|\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S*)\s*\|\s*(\S*)\s*(\S*).*", RegexOptions.IgnoreCase));

  foreach (MatchCollection matches in matchess)
  {
    foreach (Match match in matches)
    {
      foreach (Group group in match.Groups)
      {
        Console.WriteLine(group.Value);
      }
    }
  }
Sign up to request clarification or add additional context in comments.

Comments

1
static void Main()
{
    var x = @"__func_name2|0007bb7c|   T  |              FUNC|00000034|     |.text    sourcefile.c:49";
    var matches = Regex.Split(x, @"\s*\|\s*");
    int len = matches.Length;
    int i = 0;
    for (int z = 0; z < len; ++z)
    {
        ++i;
        if (z == len - 1)
        {
            var match = Regex.Match(matches[z], @"^(?i)(?'text'\.[a-z]+)(\s+(?'file'[a-z]+\.[a-z]+:[0-9]+))?$");
            WriteLine($"{++i}) {match.Groups["text"].Value}");
            WriteLine($"{++i}) {(match.Groups["file"].Length == 0 ? "" : match.Groups["file"].Value)}");
        }
        else
        {
            WriteLine($"{z+1}) {matches[z]}");
        }
    }
}

/* Output:
    1) __func_name2
    2) 0007bb7c
    3) T
    4) FUNC
    5) 00000034
    6)
    8) .text
    9) sourcefile.c:49
*/

Comments

0

Regular expressions seem to be unnecessary, yet if no option exists, this expression:

(__[^\|\s]+)\s*\|([^\|\s]+)\s*\|\s*([A-Z]+)\s*\|\s*([A-Z]+)\s*\|([^\|\s]+)\s*\|\s*\|([^\|\s]+)\s*(?:([^:]+)?\s*:\s*?([0-9]+))?

might collect our desired values, and ignores the spaces and pipes, and there is an optional group for sourcefile here:

(?:([^:]+)?\s*:\s*?([0-9]+))?

Demo

Example

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(__[^\|\s]+)\s*\|([^\|\s]+)\s*\|\s*([A-Z]+)\s*\|\s*([A-Z]+)\s*\|([^\|\s]+)\s*\|\s*\|([^\|\s]+)\s*(?:([^:]+)?\s*:\s*?([0-9]+))?";
        string input = @"__func_name2|0007bb7c|   T  |              FUNC|00000034|     |.text    sourcefile.c:49

__func_name     |00010d88|   T  |              FUNC|00000010|     |.text";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

1 Comment

very nice work. Thx a lot ! (I'd vote for you but Looks like i do not have enough Reputation)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.