1

I got bunch of strings in text, which looks like something like this:

h1. this is the Header
h3. this one the header too
h111. and this

And I got function, which suppose to process this text depends on what lets say iteration it been called

public void ProcessHeadersInText(string inputText, int atLevel = 1)

so the output should look like one below in case of been called

ProcessHeadersInText(inputText, 2)

Output should be:

<h3>this is the Header<h3>
<h5>this one the header too<h5>
<h9 and this <h9>

(last one looks like this because of if value after h letter is more than 9 it suppose to be 9 in the output)

So, I started to think about using regex.

Here's the example https://regex101.com/r/spb3Af/1/

(As you can see I came up with regex like this (^(h([\d]+)\.+?)(.+?)$) and tried to use substitution on it <h$3>$4</h$3>)

Its almost what I'm looking for but I need to add some logic into work with heading level.

Is it possible to add any work with variables in substitution?

Or I need to find other way? (extract all heading first, replace em considering function variables and value of the header, and only after use regex I wrote?)

3
  • You can just use a delegate instead of just a replacement string. Commented Apr 6, 2017 at 15:12
  • You can use MatcEvaluator msdn.microsoft.com/en-us/library/… (probably what @Joey is saying) Commented Apr 6, 2017 at 15:16
  • Oh, thats great idea! Commented Apr 6, 2017 at 15:21

4 Answers 4

2

The regex you may use is

^h(\d+)\.+\s*(.+)

If you need to make sure the match does not span across line, you may replace \s with [^\S\r\n]. See the regex demo.

When replacing inside C#, parse Group 1 value to int and increment the value inside a match evaluator inside Regex.Replace method.

Here is the example code that will help you:

using System;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
public class Test
{
    // Demo: https://regex101.com/r/M9iGUO/2
    public static readonly Regex reg = new Regex(@"^h(\d+)\.+\s*(.+)", RegexOptions.Compiled | RegexOptions.Multiline); 

    public static void Main()
    {
        var inputText = "h1. Topic 1\r\nblah blah blah, because of bla bla bla\r\nh2. PartA\r\nblah blah blah\r\nh3. Part a\r\nblah blah blah\r\nh2. Part B\r\nblah blah blah\r\nh1. Topic 2\r\nand its cuz blah blah\r\nFIN";
        var res = ProcessHeadersInText(inputText, 2);
        Console.WriteLine(res);
    }
    public static string ProcessHeadersInText(string inputText, int atLevel = 1) 
    {
        return reg.Replace(inputText, m =>
            string.Format("<h{0}>{1}</h{0}>", (int.Parse(m.Groups[1].Value) > 9 ?
                9 : int.Parse(m.Groups[1].Value) + atLevel), m.Groups[2].Value.Trim()));
    }
}

See the C# online demo

Note I am using .Trim() on m.Groups[2].Value as . matches \r. You may use TrimEnd('\r') to get rid of this char.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for such full answer, Wiktor!
2

You can use a Regex like the one used below to fix your issues.

Regex.Replace(s, @"^(h\d+)\.(.*)$", @"<$1>$2<$1>", RegexOptions.Multiline)

Let me explain you what I am doing

// This will capture the header number which is followed 
// by a '.' but ignore the . in the capture
(h\d+)\. 

// This will capture the remaining of the string till the end
// of the line (see the multi-line regex option being used)
(.*)$    

The parenthesis will capture it into variables that can be used as "$1" for the first capture and "$2" for the second capture

1 Comment

I don't think this fully answers the OP's question. The h number values need to be manipulated to ensure they are no more than 9 and can be increased by a set amount.
1

Try this:

private static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
    // Group 1 = value after 'h'
    // Group 2 = Content of header without leading whitespace
    string pattern = @"^h(\d+)\.\s*(.*?)\r?$";
    return Regex.Replace(inputText, pattern, match => EvaluateHeaderMatch(match, atLevel), RegexOptions.Multiline);
}

private static string EvaluateHeaderMatch(Match m, int atLevel)
{
    int hVal = int.Parse(m.Groups[1].Value) + atLevel;
    if (hVal > 9) { hVal = 9; }
    return $"<h{hVal}>{m.Groups[2].Value}</h{hVal}>";
}

Then just call

ProcessHeadersInText(input, 2);


This uses the Regex.Replace(string, string, MatchEvaluator, RegexOptions) overload with a custom evaluator function.

You could of course streamline this solution into a single function with an inline lambda expression:

public static string ProcessHeadersInText(string inputText, int atLevel = 1)
{
    string pattern = @"^h(\d+)\.\s*(.*?)\r?$";
    return Regex.Replace(inputText, pattern,
        match =>
        {
            int hVal = int.Parse(match.Groups[1].Value) + atLevel;
            if (hVal > 9) { hVal = 9; }
            return $"<h{hVal}>{match.Groups[2].Value}</h{hVal}>";
        },
        RegexOptions.Multiline);
}

1 Comment

Oh wow, interesing, never used it that way before (like in EvaluateHeaderMatch)!
1

A lot of good solution in this thread, but I don't think you really need a Regex solution for your problem. For fun and challenge, here a non regex solution:

Try it online!

using System;
using System.Linq;

public class Program
{
    public static void Main()
    {
        string extractTitle(string x) => x.Substring(x.IndexOf(". ") + 2);
        string extractNumber(string x) => x.Remove(x.IndexOf(". ")).Substring(1);
        string build(string n, string t) => $"<h{n}>{t}</h{n}>";

        var inputs = new [] {
            "h1. this is the Header",
            "h3. this one the header too",
            "h111. and this" };

        foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
        {
            Console.WriteLine(line);
        }
    }
}

I use C#7 nested function and C#6 interpolated string. If you want, I can use more legacy C#. The code should be easy to read, I can add comments if needed.


C#5 version

using System;
using System.Linq;

public class Program
{
    static string extractTitle(string x)
    {
        return x.Substring(x.IndexOf(". ") + 2);
    }

    static string extractNumber(string x)
    {
        return x.Remove(x.IndexOf(". ")).Substring(1);
    }

    static string build(string n, string t)
    {
        return string.Format("<h{0}>{1}</h{0}>", n, t);
    }

    public static void Main()
    {
        var inputs = new []{
            "h1. this is the Header",
            "h3. this one the header too",
            "h111. and this"
        };

        foreach (var line in inputs.Select(x => build(extractNumber(x), extractTitle(x))))
        {
            Console.WriteLine(line);
        }
    }
}

4 Comments

Latests C# features huh? Still cant force myself to use em
@DanilGholtsman it is just sugar, like lambda instead of delegate.
Yeah, i know, just, you know, hard to get use to it
@DanilGholtsman Added C#5 to compare (we have floating functions and more indentations).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.