c# regex to parse columns in a txt file

Question

I have a text file looks like this

FieldA    FieldB    FieldC    FieldD  FieldE
  001       中文                15%     语言
  002       法文      20        12%     外文 
  003       英文      21                外文
  004     西班牙语               10%     外文

so basically I have the file read in and split into lines. Now I would like to use regex to split each line into fields. As you can see some fields in the column are actually empty, the fields may not in fixed width, but is separated by at least one white space. Some fields contains Chinese characters.

May I know how to do this? Thanks.

How, do you know that 外文 goes to column FieldE, not to FieldD? — Uriil
– Uriil, Commented Aug 22, 2015 at 9:42
that is the thing, i need the regex to know that there are 5 fields. but last fieldE is confirmed to be chinese, while FieldD is percent or empty. — tesla1060
– tesla1060, Commented Aug 22, 2015 at 9:43

Douglas · Accepted Answer · 2015-08-22 09:58:44Z

1

string s = "001       中文                15%     语言";
Match m = Regex.Match(s, 
    @"(?<A>\d*)\s*" +       // Field A: any number of digits
    @"(?<B>\p{L}*)\s*" +    // Field B: any number of letters
    @"(?<C>\d*)\s+" +       // Field C: any number of digits
    @"(?<D>(\d+%)?)\s*" +   // Field D: one or more digits followed by '%', or nothing
    @"(?<E>\p{L}*)");       // Field E: any number of letters
string fieldA = m.Groups["A"].Value;    // "001"
string fieldB = m.Groups["B"].Value;    // "中文"
string fieldC = m.Groups["C"].Value;    // ""
string fieldD = m.Groups["D"].Value;    // "15%"
string fieldE = m.Groups["E"].Value;    // "语言"

All fields are optional. If a field is not present, it will be captured as an empty string, like in fieldC above.

edited Aug 22, 2015 at 9:58

answered Aug 22, 2015 at 9:52

Douglas

55.1k14 gold badges145 silver badges193 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nikolay Over a year ago

This will fail if some fields are missing.

Douglas Over a year ago

@Nikolay: The original version would only have failed if Field A was missing. In the updated version, all fields are optional.

score 1 · Accepted Answer · 2015-08-22 10:24:32Z

1

/\s*(\d*)\s*([^\d\s]*)\s*(\d*)\s\s*(\d*%?)\s*([^\d\s]*)/

Here is a regex that will capture all of the content you want, use it on each line.

\s*         //any number of whitespace
(\d*)       //any number of digits
\s*         //any number of whitespace
([^\d\s]*)  //any number of characters that aren't whitespace or digits
\s*         //any number of whitespace
(\d*)\s     //any number of digits with a space after it
\s*         //any number of whitespace
(\d*%?)     //any number of digits with an optional %
\s*         //any number of whitespace
([^\d\s]*)  //any number of characters that aren't whitespace or digits

edited Aug 22, 2015 at 10:24

answered Aug 22, 2015 at 9:58

user3496378

Collectives™ on Stack Overflow

c# regex to parse columns in a txt file

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related