2

I have a following string, with line breaks in a textfile:

Card No.      Seq     Account 1   Account 2  Account 3  Account 4   Customer Name          Expiry   Status

0100000184998  1   2500855884500                 -          -       /NIRMAL PRADHAN          1302     Cold
0100000186936  1                      -          -          -       /RITA SHRESTHA           1302     Cold
0100000238562  1   2500211214500                 -          -       /HARRY SHARMA            1301     Cold
0100000270755  0   1820823730100      -          -                  /EXPRESS ACCOUNT         9999     Cold
0100000272629  0   1820833290100      -          -          -       /ROMA MAHARJAN           1208     Cold
0100000272637  0   2510171014500      -                     -       /NITIN KUMAR SHRESTHA    1208     Cold
0100000272645  0   1800505550100      -          -          -       /DR HARI BHATTA          1208     Cold

Here,

  • Card No., Seq has fixed digits.
  • Account 1, Account 2, Account 3, Account 4 can have fixed digit number or - or null.
  • Customer Name can have First Name, Last Name, Middle Name etc.

My desired result is:

array[0][0] = "0100000184998"
array[0][1] = "1"
array[0][2] = "2500855884500"
array[0][3] = " "
array[0][4] = "-"
array[0][6] = "NIRMAL PRADHAN "

array[1][0] = "0100000186936"
array[1][1] = "1"
array[1][3] = " "
array[1][4] = "-"

Here, What I tried is:

 var sourceFile = txtProcessingFile.Text;
string contents = System.IO.File.ReadAllText(sourceFile);
    var newarr =  contents.Split(new char[]{ '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries)
    .Select (x => 
      x.Split(new char[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries).ToArray()
    ).ToArray();


DataTable dt = new DataTable("NewDataTable");   


dt.Columns.Add("CardNo");
dt.Columns.Add("SNo");
dt.Columns.Add("Account1");
and so on...

for (int row = 0; row < newarr.Length; row++)
{

   for (int col = 0; col < newarr[col].Length; col++)
   {
      dt.Rows.Add(newarr[row]);
      row++;

   }


}

This works fine if data field is not empty and Customer name is only the first name or delimited.

But, here what I am trying to get is:

  • First Name, Middle Name or Last Name must be stored in the same array element.
  • Account Number in the array element must left blank if it is blank.

    How is it possible to store it correctly on the datatable ?

4
  • Expiry and Status will always have non-empty values? Commented Jun 7, 2013 at 11:43
  • yes..(but, my assumption is if it can be implemented in account number, it wouldn't be problem in Expiry and Status.) Commented Jun 7, 2013 at 11:45
  • 1. Since you are using only space to seperate data-entities in the file, use a spacial character to store First-name and last name. eg. Harry<?/>SHARMA. You can remove <?/> while reading. 2. Same way use a special symbol to designate blank character, say use <?#?> if account no. is blank, and remove it while file reading operation. Commented Jun 7, 2013 at 11:48
  • @GrantWinney I have no control on text data. Commented Jun 7, 2013 at 11:50

3 Answers 3

1

I suggest that you learn to use the TextFieldParser class. Yes, it's in the Microsoft.VisualBasic namespace, but you can use it from C#. That class lets you easily parse text files that have fixed field widths. See the article How to: Read From Fixed-width Text Files in Visual Basic for an example. Again, the sample is in Visual Basic, but it should be very easy to convert to C#.

Sign up to request clarification or add additional context in comments.

Comments

0

If you are willing to make the compromise of not making a difference between - and null values in the account values, you may try this:

var sourceFile = txtProcessingFile.Text;
string[] contents = System.IO.File.ReadAllLines(sourceFile);
DataTable dt = new DataTable("NewDataTable");

dt.Columns.Add("CardNo");
dt.Columns.Add("SNo");
dt.Columns.Add("Account1");
dt.Columns.Add("Account2");
dt.Columns.Add("Account3");
dt.Columns.Add("Account4");
dt.Columns.Add("CustomerName");
dt.Columns.Add("Expiry");
dt.Columns.Add("Status");

for (int row = 2; row < contents.Length; row++)
{
    var newRow = dt.NewRow();
    var regEx = new Regex(@"([\w]*)");
    var matches = regEx.Matches(contents[row].ToString())
        .Cast<Match>()
        .Where(m => !String.IsNullOrEmpty(m.Value))
        .ToList();
    var numbers = matches.Where(m => Regex.IsMatch(m.Value, @"^\d+$")).ToList();
    var names = matches.Where(m => !Regex.IsMatch(m.Value, @"^\d+$")).ToList();
    for (int i = 0; i < numbers.Count() - 1; i++)
    {
        newRow[i] = numbers.Skip(i).First();
    }
    newRow[newRow.ItemArray.Length - 2] = numbers.Last();
    newRow[newRow.ItemArray.Length - 1] = names.Last();
    newRow[newRow.ItemArray.Length - 3] = names.Take(names.Count() - 1).Aggregate<Match, string>("", (a, b) => a += " " + b.Value);
    dt.Rows.Add(newRow);
}

Comments

0

To get around the names with a single space in them, you could try splitting on a double-space instead of a single space:

x.Split(new string[]{ "  " }

This still won't fix the issue with columns that have no value in them. It appears that your text file has everything in a specific position. Seq is in position 16, Account 1 is in position 20, etc.

Once your lines are stored in newarr, you may just want to use String.Substring() with .Trim() to get the value in each column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.