3

I have some ideas how to use regex in the string.replace method to "match" values but not really how to manipulate them. I am attempting to rename files by finding the common patterns below in filenames and replacing them with a more standardized naming convention.

This is an example of what I am trying to replace:

"1x01" "01x01" "101" "S01E" "S01 E" "S1E" "S1 E" "S1x"

and replace it with S01xE01 where S01 represents Season 1 and E01 represents Episode 1 so there would of course be variation to the the numeric values... My thought was regex but after scowering the next I am not going to find a specific enough example to help me.

Part of the reason I am stuck is because I do not know how to do the replace even if I find the matching expression. Example if I do something like string.replace("S\d*E\d*","what do i put here?").

Is there a simple regex that would be able to accomplish this task?

Edit: I've been reviewing I looked at Regex Tutorial and 30 Minute Regex Tutorial

4

3 Answers 3

1

In this case, I'd probably use the overload of Regex.replace that takes a MatchEvaluator. This allows you to pass a function that takes a Match expression and returns the replacement string.

Here's an example that uses that and will match all of your examples. I've also embedded your strings inside of filenames to show how they are replaced inside of the filenames (which seems your goal).

I've used a lambda expression here for the MatchEvaluator. If you wanted more complex logic, you can use a method on your class

I used two regular expressions: one to match the only numbers case and one to match everything else. I often find that using multiple simple regular expressions is far more maintainable than trying to use one complex one.

EDIT: updated to use a priority list of regular expressions to try. It will stop checking after the first match found in the list

You'll have to determine what rules (regexes) you want to use in what order to fit your data.

string[] filenames = {
"1000 Ways to Die S01E01 Life Will Kill You",
"somefile1x01description.ext",
"sometext01x01description.ext",
"sometext101description.ext",
"sometextS01Edescription.ext",
"sometextS01 Edescription.ext",
"sometextS1Edescription.ext",
"sometextS1 Edescription.ext",
"sometextS1xdescription.ext",
"24 S01xE01 12 AM"
};

string [] res = {
    @"[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})", // Handles the cases where you have a delimiter and a digit on both sides, optional S
    @"[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})", // Handles the cases where you have a delimiter, a required S, but optional episode number
    @"(?<season>\d{1,2})(?<episode>\d{2})"  // Handles the case where you just have a 3 or 4 digit number
};

MatchEvaluator reFunc = match => // Given a Regex Match object
// An expression that returns the replacement string
"S" + // Start with the S
match.Groups["season"].Value // get the season group
.PadLeft(2,'0') + // zero pad it
"xE" + // Add the E
(match.Groups["episode"].Value.Length > 0 ? // Is there an episode number?
match.Groups["episode"].Value.PadLeft(2,'0') : // If so, zero pad it
"01" // Otherwise assume episode 01
); // End replacement expression

foreach(string name in filenames)
{
    Console.WriteLine("Orig: {0}",name);
    string replaced = name;

    foreach (string re in res)
    {
        Console.WriteLine("Trying:" + re);
        if(Regex.IsMatch(name,re))
        {
            Console.WriteLine("Matched");
            replaced = Regex.Replace(name,re,reFunc);
            break;
        }
    }
    Console.WriteLine("Replaced: {0}\n\n",replaced);
}

Output:

Orig: 1000 Ways to Die S01E01 Life Will Kill You
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Matched
Replaced: 1000 Ways to Die S01xE01 Life Will Kill You


Orig: somefile1x01description.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Matched
Replaced: somefileS01xE01description.ext


Orig: sometext01x01description.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Matched
Replaced: sometextS01xE01description.ext


Orig: sometext101description.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Trying:[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})
Trying:(?<season>\d{1,2})(?<episode>\d{2})
Matched
Replaced: sometextS01xE01description.ext


Orig: sometextS01Edescription.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Trying:[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})
Matched
Replaced: sometextS01xE01description.ext


Orig: sometextS01 Edescription.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Trying:[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})
Matched
Replaced: sometextS01xE01description.ext


Orig: sometextS1Edescription.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Trying:[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})
Matched
Replaced: sometextS01xE01description.ext


Orig: sometextS1 Edescription.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Trying:[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})
Matched
Replaced: sometextS01xE01description.ext


Orig: sometextS1xdescription.ext
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Trying:[sS](?<season>\d{1,2})[ xXeE]+(?<episode>\d{0,2})
Matched
Replaced: sometextS01xE01description.ext


Orig: 24 S01xE01 12 AM
Trying:[sS]?(?<season>\d{1,2})[ xXeE]+(?<episode>\d{1,2})
Matched
Replaced: 24 S01xE01 12 AM
Sign up to request clarification or add additional context in comments.

7 Comments

That looks like what I need (not that I understand how it works lol) but - where do I stick in the "x" between the Season and the EP... so the output replaced is: "S01xE01" ?
Updated to add the x in the output
OK thanks a lot, this really saves me a lot of for loops with crazy logic to do!
I also tried to comment it well to explain what it is doing. I recommend doing the same if you ever need to maintain code like this :)
The comments are great - How could I handle a show that has a number in the series name like this: Original: 1000 Ways to Die S01E01 Life Will Kill You Replaced: S10xE00 Ways to Die S01E01 Life Will Kill You
|
1

The string.replace(pattern, replacement) method takes all the portions of the string that matched the given pattern and replace them with the given replacement, then it returns a new string with the result.

In your case you need part of the matched portions to use them in the replacement. To do it you can use Groups, you create a group using parenthesis () inside a pattern. Groups allows you to catch parts of the matched string and then make reference to them in the replacement.

For example if you want to change "S01E02" to Season-01-Episode-02 you will need a pattern like "S(\d+)E(\d+)" with two groups. Then you could do something like:

"blabla S01E02 asdasd S01E05 erterert S04E07".replace("Season-$1-Episode-$2")

The result would be something like:

"blabla Season-01-Episode-02 asdasd Season-01-Episode-05 erterert Season-04-Episode-07"

$1 and $2 are the way you reference the groups in the replacement.

Comments

0

You can try something like this:

string s=@"Dr Who 101";

s = Regex.Replace(s,
    @"(?i)S?(?<!\d)(?<sa>\d{1,2})x ?E?(?<ep>\d{2})?(?!\d)",
    delegate(Match match) {

    return "S"
         + ((match.Groups["sa"].ToString().Length<2)?"0":"")
         + match.Groups["sa"].ToString()
         + "xE" + match.Groups["ep"].ToString();

});

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.