1

I have a situation where I have a csv file as follows: Student Names, Address.

However, student names column could have duplicates so if that's the case i need to create a new file with only those duplicated student name and address - keep going until each file has no duplicated student names in a particular file.

Ie.

Student Names   Address
John            5 West st.
David           42 Alan st.
John            22 Dees st.
Smith           2 King st.
David           77 Jack st.
John            33 King st.

Should be divided into 3 files like so: 1st File:

Student Names   Address
John            5 West st.
David           42 Alan st.
Smith           2 King st.

2nd File:

Student Names   Address
John            22 Dees st.
David           77 Jack st.

3rd File:

Student Names   Address
John            33 King st.

My logic was to take the file put it into a DataTable and was to create a dictionary of Student Names -> Address -- However, Dictionary will not work because they keys are NOT unique. So my next logic was to create a list of Student Names and find out the duplicates from there and create a Datatable and create a file from there. I feel like this is more complicated as it is - Im pretty sure there must be an easier way in LiNQ - Could you guys help me out or shoot some pointers.

Thanks.

1
  • You're looking for a Lookup<Tkey,TValue> Commented Sep 18, 2015 at 15:24

5 Answers 5

2

The Dictionary approach is quite good actually. I would stick with it. Make the key of the dictionary, the names and the value the address. That way you will know how many files you need to create by finding the name with the most amount of addresses. The number of addresses will be the number of files you need to create.

Then go through the list of names and add them and the address to separate files in sequence. Then, once all names have been exhausted you are done.

In your example above you will have a Dictionary like this

John -> 5 West st., 22 Dees st., 33 King st.
David -> 42 Alan st., 77 Jack st.    
Smith -> 2 King st.

As @ric said this will be a Dictionary<string, List<string>>

Sign up to request clarification or add additional context in comments.

1 Comment

You mean Dictionary<string, List<string>>? I suppose you would have to get the max count from the list to work out the number of files you would need
1

Assuming that you have a class like

public class Student
{
    public string Name { get; set; }
    public string Address { get; set; }
}

In linq You can group the students by Names

 var students = LoadStudentsFromFile();
 var studentsByName = students.GroupBy(st => st.Name).ToDictionary(g => g.Key, g => g.ToList());

At this moment you will have a Dictionary with student names as keys and a list of students as values

John ->  [{Name: John, Address: 5 West st.}, {Name: John, Address: 22 Dees st.}, {Name: John, Address: 33 King st.}]
David -> [{Name: David, Address: 42 Alan st.}, {Name: David, Address: 277 Jack st.}]
...

Then you can iterate over the keys and take one from the end of each until empty the list and dictionary. Take from the end to avoid re-sizing of the list.

 while(studentsByName.Any())
 {
     var uniqueStudents = new List<Student>();
     foreach(var name in studentsByName.Keys)
     {
         uniqueStudents.Add(studentsByName[name].Last());
         studentsByName[name].RemoveAt(studentsByName[name].Count -1);
         if(studentsByName[name].Count == 0)
         {
             studentsByName.Remove(name);
         }
     }

     SaveListOfUniqueStudents(uniqueStudents);
 }

4 Comments

Thats pretty clever but a little confusing on what this method returns: LoadStudentsFromFile() - Is it a datatable?
@civic.sir It would be an IEnumerable<Student> (EX: Student[], List<Student>, IQueryable<Student> etc.)
Here is a primitive example: IEnumerable<Student> LoadStudentsFromFile(string path) { return File.ReadLines(path).Select(x=>{ var fields=x.Split(','); return new Student {Name=fields[0],Id=field[1]}); }
Great thanks a lot Limo and Robert. I implemented this algorithm few changes but it works now.. Thanks again
1

Simple version, assuming the CSV's are simplistic, comma separated, and doesn't allow for the strings to be enclosed in double quotes, but can be extended if you need it to be:

IEnumerable<Student> LoadStudentsFromFile(string path)
{
  return File.ReadLines(path).Select(x=>{
    var fields=x.Split(','); 
    return new Student {Name=fields[0],Id=field[1]});
}
void SaveStudentsToFile(path,IEnumerable<Student> students)
{
  File.WriteAllLines(path,students);
}
var students=LoadStudentsFromFile("inputfile.csv");
var studentsByName = students.GroupBy(st => st.Name)
  .ToDictionary(g => g.Key, g => g.ToList());

var max=studentsByName.Max(x=>x.Value.Count());
for(var x=0;x<max;x++)
  SaveStudentsToFile("outfile"+x+".csv",
    studentsByName.Where(s=>s.Value.Count()>=x+1)
      .Select(s=>string.Format("{0},{1}",s.Key,s.Value.Skip(x).First)));

Comments

0

I'd go with something like: Create a Class (StudentFileWriter) that holds a Writer for a CSV file and a List of the names in that file. Whenever you write to the file, add the name to the List.

Create a List of StudentFileWriters

Then read one line of your file at a time, check the first StudentFileWriter if its ListOfNames.Contains(string newNameToInsert) If true, go to the next one, if there isn't a new one, create one and Write to it's new file. If false, just Write to it's file.

You could probably write it in a big complex bit of Linq too with Groupings/Rankings, etc but this way it should be easy to debug and see what's going on.

Comments

0

My idea is to create a list of dictionary. We have Student class (thx @LimoWanKenobi):

public class Student
{
    public string Name { get; set; }
    public string Address { get; set; }
}

Here is my method:

    IEnumerable<IEnumerable<Student>> Process(IEnumerable<Student> students)
    {
        var files = new List<Dictionary<string, Student>>();

        foreach (var student in students)
        {
            var isAdded = false;
            foreach (var file in files)
            {
                if (!file.ContainsKey(student.Name))
                {
                    file.Add(student.Name, student);
                    isAdded = true;
                    break;
                }
            }

            if (!isAdded)
            {
                files.Add(new Dictionary<string, Student>
                {
                    { student.Name, student }
                });
            }
        }

        return files.Select(kvp => kvp.Values);
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.