Parse a C Output map using regex

Question

I'm currently struggling to parse an C Output.map File using regex. I'm treating each line separately: A single line could look like this

__func_name     |00010d88|   T  |              FUNC|00000010|     |.text

Expected Output:
1) "__func_name"
2) "00010d88"
3) "T"
4) "FUNC"
5) "00000010"
6) (empty string)
7) ".text"
8) (empty string)

However the number of white spaces between the texts varies: Another line could Look like this:

__func_name2|0007bb7c|   T  |              FUNC|00000034|     |.text    sourcefile.c:49

1) "__func_name2" 2) "0007bb7c" 3) "T" 4) "FUNC" 5) "00000034" 6) (empty string)

7) ".text"

8) "sourcefile.c:49"

As you can see not only the number of white spaces varies, but there is also the source file listed. Now i did try to solve this Problem using the regexr. I basically need the following requirements for my regex

Alphanumeric string
A (hex)Number
A single letter
A String
A (hex)number
An optional string
Another optional string

Each Group is separated by a | character. I tried this regex. Although incomplete, regexr tells me that I'm only matching the first group.

Could you help me to figure out what's wrong with my regex?

([__A-Za-z0-9])\w+|((([\|]{1})&[0-9a-h]&([\|]{1})))\w+|([A-Z])\w+

You can try a live demo here: https://regexr.com/4gpvf

Edit: Expected Outputs added

It seems rather obvious that | is being used as a delimiter. Wouldn't it be far simpler to split by that, then trim each resulting string? The last segment would be .text sourcefile.c:49, and that can be easily parsed with a much simpler regex. — user47589
– user47589, Commented Jul 2, 2019 at 14:22
What output would you expect in your second example - would you expect the source file to be part of the final string, two separate strings or the source file omitted? — PaulF
– PaulF, Commented Jul 2, 2019 at 14:25
hm split is a good idea. Do you mean like this? string[] single_element = single_line.Split((char)('|')); ? — JuliusCaesar
– JuliusCaesar, Commented Jul 2, 2019 at 14:40
Just single_line.Split('|'). I would not remove empty columns if you want to preserve the column indices. — user47589
– user47589, Commented Jul 2, 2019 at 14:43

user1562155 · Accepted Answer · 2019-07-02 15:19:52Z

1

A rather simple match pattern could be this:

@"\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S)\s*\|\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S*)\s*\|\s*(\S*)\s*(\S*).*"

Executed this way:

  string[] data = 
  {
    "__func_name   | 00010d88 | T | FUNC | 00000010 |     |.text",
    "__func_name2 | 0007bb7c | T | FUNC | 00000034 |     |.text    sourcefile.c:49"
  };

  var matchess = data.Select(s => Regex.Matches(s, @"\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S)\s*\|\s*(\S*)\s*\|\s*([a-f0-9]+)\s*\|\s*(\S*)\s*\|\s*(\S*)\s*(\S*).*", RegexOptions.IgnoreCase));

  foreach (MatchCollection matches in matchess)
  {
    foreach (Match match in matches)
    {
      foreach (Group group in match.Groups)
      {
        Console.WriteLine(group.Value);
      }
    }
  }

answered Jul 2, 2019 at 15:19

user1562155

Sign up to request clarification or add additional context in comments.

Comments

JohnyL · Accepted Answer · 2019-07-02 19:17:20Z

static void Main()
{
    var x = @"__func_name2|0007bb7c|   T  |              FUNC|00000034|     |.text    sourcefile.c:49";
    var matches = Regex.Split(x, @"\s*\|\s*");
    int len = matches.Length;
    int i = 0;
    for (int z = 0; z < len; ++z)
    {
        ++i;
        if (z == len - 1)
        {
            var match = Regex.Match(matches[z], @"^(?i)(?'text'\.[a-z]+)(\s+(?'file'[a-z]+\.[a-z]+:[0-9]+))?$");
            WriteLine($"{++i}) {match.Groups["text"].Value}");
            WriteLine($"{++i}) {(match.Groups["file"].Length == 0 ? "" : match.Groups["file"].Value)}");
        }
        else
        {
            WriteLine($"{z+1}) {matches[z]}");
        }
    }
}

/* Output:
    1) __func_name2
    2) 0007bb7c
    3) T
    4) FUNC
    5) 00000034
    6)
    8) .text
    9) sourcefile.c:49
*/

Emma Marcier · Accepted Answer · 2019-07-02 14:44:50Z

0

Regular expressions seem to be unnecessary, yet if no option exists, this expression:

(__[^\|\s]+)\s*\|([^\|\s]+)\s*\|\s*([A-Z]+)\s*\|\s*([A-Z]+)\s*\|([^\|\s]+)\s*\|\s*\|([^\|\s]+)\s*(?:([^:]+)?\s*:\s*?([0-9]+))?

might collect our desired values, and ignores the spaces and pipes, and there is an optional group for sourcefile here:

(?:([^:]+)?\s*:\s*?([0-9]+))?

Demo

Example

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(__[^\|\s]+)\s*\|([^\|\s]+)\s*\|\s*([A-Z]+)\s*\|\s*([A-Z]+)\s*\|([^\|\s]+)\s*\|\s*\|([^\|\s]+)\s*(?:([^:]+)?\s*:\s*?([0-9]+))?";
        string input = @"__func_name2|0007bb7c|   T  |              FUNC|00000034|     |.text    sourcefile.c:49

__func_name     |00010d88|   T  |              FUNC|00000010|     |.text";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

answered Jul 2, 2019 at 14:44

Emma Marcier

27.8k12 gold badges49 silver badges71 bronze badges

1 Comment

JuliusCaesar Over a year ago

very nice work. Thx a lot ! (I'd vote for you but Looks like i do not have enough Reputation)

Collectives™ on Stack Overflow

Parse a C Output map using regex

3 Answers 3

Comments

Comments

Demo

Example

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Example

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related