2

How to make Regex.Replace for the following texts:

1) "Name's",     "Sex", "Age", "Height_(in)", "Weight (lbs)"
2) " LatD", "LatM ", 'LatS', "NS", "LonD", "LonM", "LonS", "EW", "City", "State"

Result:

1) Name's, Sex, Age, Height (in), Weight (lbs)
2) LatD, LatM, LatS, NS, LonD, LonM, LonS, EW, City, State

Spaces between brackets can be any size (Example 1). There may also be incorrect spaces in brackets (Example 2). Also, instead of spaces, the "_" sign can be used (Example 1). And instead of double quotes, single quotes can be used (Example 2).

As a result, words must be separated with a comma and a space.

Snippet of my code

StreamReader fileReader = new StreamReader(...);
var fileRow = fileReader.ReadLine();
fileRow = Regex.Replace(fileRow, "_", " ");
fileRow = Regex.Replace(fileRow, "\"", "");
var fileDataField = fileRow.Split(',');
8
  • Yes, I don't know much about regular expression... Commented Jun 12, 2019 at 11:25
  • 2
    Not to do with the question, but it's Names - no apostrophe. Commented Jun 12, 2019 at 11:25
  • What do you have in mind? Commented Jun 12, 2019 at 11:27
  • Is that what you want? Commented Jun 12, 2019 at 11:27
  • Probably, fileRow = Regex.Replace(fileRow, @"(?:"([^"]*)"|'([^']*)')(?:(,)(\s)*|\s*$)", m => $"{m.Groups[1].Value.Trim()}{m.Groups[2].Value.Trim()}{m.Groups[3].Value}{m.Groups[4].Value}") Commented Jun 12, 2019 at 11:27

2 Answers 2

2

I don't well know C# syntax, but this regex does the job:

  • Find: (?:_|^["']\h*|\h*["']$|\h*["']\h*,\h*["']\h*)
  • Replace: A space

Explanation:

(?:                         # non capture group
    _                       # undersscore
  |                         # OR
    ^["']\h*                # beginning of line, quote or apostrophe, 0 or more horizontal spaces
  |                         # OR
    \h*["']$                # 0 or more horizontal spaces, quote or apostrophe, end of line
  |                         # OR
    \h*["']\h*              # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
    ,                       #
    \h*["']\h*              # 0 or more horizontal spaces, quote or apostrophe, 0 or more horizontal spaces
)                           # end group

Demo

Sign up to request clarification or add additional context in comments.

2 Comments

\h - [[:blank:]], ok)
@MiT: OK, thank you, may be \s can be used instead of \h
1

How about a simple straight string manipulation way?

using System;
using System.Linq;
static void Main(string[] args)
{
    string dirty1 = "\"Name's\",     \"Sex\", \"Age\", \"Height_(in)\", \"Weight (lbs)\"";
    string dirty2 = "\" LatD\", \"LatM \", 'LatS', \"NS\", \"LonD\", \"LonM\", \"LonS\", \"EW\", \"City\", \"State\"";
    Console.WriteLine(Clean(dirty1));
    Console.WriteLine(Clean(dirty2));

    Console.ReadKey();
}

private static string Clean(string dirty)
{
    return dirty.Split(',').Select(item => item.Trim(' ', '"', '\'')).Aggregate((a, b) => string.Join(", ", a, b));
}

private static string CleanNoLinQ(string dirty)
{
    string[] items = dirty.Split(',');
    for(int i = 0; i < items.Length; i++)
    {
        items[i] = items[i].Trim(' ', '"', '\'');
    }
    return String.Join(", ", items);
}

You can even replace the LinQ with a foreach and then string.Join().

Easier to understand - easier to maintain.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.