9

I've got data which looks like this...

   1 TESTAAA      SERNUM    A DESCRIPTION
   2 TESTBBB      ANOTHR    ANOTHER DESCRIPTION
   3 TESTXXX      BLAHBL

My question is, what is the most efficient way to split this data into it's smaller substrings, as there will be hundreds of lines. Also, some of the lines will be missing the last column. I tried to do regex but wasn't successful with the pattern I used for widths. The data above should break down into these fields (length of each column listed below)

{id} {firsttext} {serialhere} {description}
 4    22          6            30+

Can anyone lend a hand or suggest a good regex matching pattern to extract the information?

Thanks, Simon

1

3 Answers 3

8

Try the following regex:

(.{4})(.{22})(.{6})(.+)?

If the values are always nonempty and separated with whitespace (that is, they don't run into each other), then try something simpler like

line.Split(" ")
Sign up to request clarification or add additional context in comments.

Comments

6

I would actually recommend writing a method to do this via String.Substring directly. This will likely be more efficient at giving you the exact required widths.

This would likely work (though it's untested, and purposefully does not strip the string padding):

public static string[] SplitFixedWidth(string original, bool spaceBetweenItems, params int[] widths)
{
    string[] results = new string[widths.Length];
    int current = 0;

    for (int i = 0; i < widths.Length; ++i)
    {
        if (current < original.Length)
        {
            int len = Math.Min(original.Length - current, widths[i]);
            results[i] = original.Substring(current, len);
            current += widths[i] + (spaceBetweenItems ? 1 : 0);
        }
        else results[i] = string.Empty;
    }

    return results;
}

That being said, if you're reading this from a Stream or text file directly, using TextFieldParser will allow you to read the data directly as fixed width data.

5 Comments

TextFieldParse can read from any Stream or TextReader, so it doesn't have to be going to a physical file.
@SteveDog Yeah - the data does need to be in a Stream. This is rarely an issue, though, but I wouldn't necessarily put it there if you already (for some reason) had a string array you were processing, or something like that. That being said, I edited to include that info
No, I'm saying it will take a TextReader too, so you can just instantiate it like new TextFieldParser(new StringReader("the data")). No stream necessary.
@SteveDog Yeah - but if you already had string[] data, where each "line" was a separate string (which can happen if this is a result of a WCF call, or something), I'd do the parsing manually instead of creating a new StringReader per line... That's what I was trying to say
Problem with this is that you probably would not know what is the allowed string length. Usually the line width is in points.
6

Check out this link on the MSDN:

http://msdn.microsoft.com/en-us/library/zezabash.aspx

Basically, the TextFieldParser class does exactly this kind of thing. It's also a great way to read delimited data, like CSV files. For whatever reason Microsoft chose to put it under the Microsoft.VisualBasic.FileIO namespace, which is annoying because it doesn't really have anything to do with VB.

For example, you could use it like this:

TextFieldParser parser = new TextFieldParser(new StringReader(fixedWidthData));
parser.TextFieldType = FieldType.FixedWidth;
parser.SetFieldWidths(4, 22, 6, -1);
while (!parser.EndOfData)
{
    string[] row = parser.ReadFields();
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.