Split a string into sub strings by fixed width

Question

I've got data which looks like this...

   1 TESTAAA      SERNUM    A DESCRIPTION
   2 TESTBBB      ANOTHR    ANOTHER DESCRIPTION
   3 TESTXXX      BLAHBL

My question is, what is the most efficient way to split this data into it's smaller substrings, as there will be hundreds of lines. Also, some of the lines will be missing the last column. I tried to do regex but wasn't successful with the pattern I used for widths. The data above should break down into these fields (length of each column listed below)

{id} {firsttext} {serialhere} {description}
 4    22          6            30+

Can anyone lend a hand or suggest a good regex matching pattern to extract the information?

Thanks, Simon

See my solution at following posting : stackoverflow.com/questions/65038691/… — jdweng
– jdweng, Commented Dec 18, 2020 at 12:44

Hew Wolff · Accepted Answer · 2012-07-06 15:54:07Z

8

Try the following regex:

(.{4})(.{22})(.{6})(.+)?

If the values are always nonempty and separated with whitespace (that is, they don't run into each other), then try something simpler like

line.Split(" ")

answered Jul 6, 2012 at 15:54

Hew Wolff

1,5098 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Reed Copsey · Accepted Answer · 2012-07-06 16:01:44Z

6

I would actually recommend writing a method to do this via String.Substring directly. This will likely be more efficient at giving you the exact required widths.

This would likely work (though it's untested, and purposefully does not strip the string padding):

public static string[] SplitFixedWidth(string original, bool spaceBetweenItems, params int[] widths)
{
    string[] results = new string[widths.Length];
    int current = 0;

    for (int i = 0; i < widths.Length; ++i)
    {
        if (current < original.Length)
        {
            int len = Math.Min(original.Length - current, widths[i]);
            results[i] = original.Substring(current, len);
            current += widths[i] + (spaceBetweenItems ? 1 : 0);
        }
        else results[i] = string.Empty;
    }

    return results;
}

That being said, if you're reading this from a Stream or text file directly, using TextFieldParser will allow you to read the data directly as fixed width data.

edited Jul 6, 2012 at 16:01

answered Jul 6, 2012 at 15:46

Reed Copsey

567k80 gold badges1.2k silver badges1.4k bronze badges

5 Comments

Steven Doggart Over a year ago

TextFieldParse can read from any Stream or TextReader, so it doesn't have to be going to a physical file.

Reed Copsey Over a year ago

@SteveDog Yeah - the data does need to be in a Stream. This is rarely an issue, though, but I wouldn't necessarily put it there if you already (for some reason) had a string array you were processing, or something like that. That being said, I edited to include that info

Steven Doggart Over a year ago

No, I'm saying it will take a TextReader too, so you can just instantiate it like new TextFieldParser(new StringReader("the data")). No stream necessary.

Reed Copsey Over a year ago

@SteveDog Yeah - but if you already had string[] data, where each "line" was a separate string (which can happen if this is a result of a WCF call, or something), I'd do the parsing manually instead of creating a new StringReader per line... That's what I was trying to say

c4da Over a year ago

Problem with this is that you probably would not know what is the allowed string length. Usually the line width is in points.

Steven Doggart · Accepted Answer · 2012-07-06 16:09:28Z

6

Check out this link on the MSDN:

http://msdn.microsoft.com/en-us/library/zezabash.aspx

Basically, the TextFieldParser class does exactly this kind of thing. It's also a great way to read delimited data, like CSV files. For whatever reason Microsoft chose to put it under the Microsoft.VisualBasic.FileIO namespace, which is annoying because it doesn't really have anything to do with VB.

For example, you could use it like this:

TextFieldParser parser = new TextFieldParser(new StringReader(fixedWidthData));
parser.TextFieldType = FieldType.FixedWidth;
parser.SetFieldWidths(4, 22, 6, -1);
while (!parser.EndOfData)
{
    string[] row = parser.ReadFields();
}

edited Jul 6, 2012 at 16:09

answered Jul 6, 2012 at 15:54

Steven Doggart

43.8k8 gold badges71 silver badges109 bronze badges

Collectives™ on Stack Overflow

Split a string into sub strings by fixed width

3 Answers 3

Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related