3

I have a CSV file with datapoints as

student, year, subject, score1, score2, score3, ..., score100
Alex, 2010, Math, 23, 56, 43, ..., 89
Alex, 2011, Science, 45, 32, 45, ..., 65
Matt, 2009, Art, 34, 56, 75, ..., 43
Matt, 2010, Math, 43, 54, 54, ..., 32

What would be the best way to load such CSV as Map in Java. This data is used for lookup service hence the chosen map data structure. The key would be the Tuple (student, year) -> which returns a list of subject + scores (SubjectScore.class). So the idea is given the name of the student and year, get all subjects and scores.

I didn't find an elegant solution while searching to read the CSV file in a Map of defined classes like Map<Tuple, List<SubjectScore>>

class Tuple {
  private String student;
  private int year;
}

class SubjectScore {
  private String subject;
  private int score1;
  private int score2;
  private int score3;
  // more fields here
  private int score100;
}

Additional details: The CSV file is large ~ 2 GB but is static in nature, hence deciding to load in memory.

2
  • What if a student has attended several subjects in one year? In that case isn't a Map<Tuple,List<SubjectScore>> more suitable? Commented Jun 12, 2020 at 19:42
  • Oh yes! Thanks, I'll edit the post. Commented Jun 12, 2020 at 20:37

2 Answers 2

2

Please find below a first example, which may serve as a starting point. I have removed the dots in your example input data and assume a simplified example with 4 scores.

student, year, subject, score1, score2, score3, ..., score100
Alex, 2010, Math, 23, 56, 43, 89
Alex, 2011, Science, 45, 32, 45, 65
Matt, 2009, Art, 34, 56, 75, 43
Matt, 2010, Math, 43, 54, 54, 32
Alex, 2010, Art, 43, 54, 54, 32

I also assume that you have overwritten the equals and hashcode methods in your tuple class and implemented a suitable constructor

class Tuple {
    private String student;
    private int year;

    public Tuple(String student, int year) {
        this.student = student;
        this.year = year;
    }

    @Override
    public int hashCode() {
        int hash = 7;
        hash = 79 * hash + Objects.hashCode(this.student);
        hash = 79 * hash + this.year;
        return hash;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final Tuple other = (Tuple) obj;
        if (this.year != other.year) {
            return false;
        }
        return Objects.equals(this.student, other.student);
    }   

    @Override
    public String toString() {
        return "Tuple{" + "student=" + student + ", year=" + year + '}';
    }
}

and a SubjectScore class with a suitable constructor

class SubjectScore {

    private String subject;
    private int score1;
    private int score2;
    private int score3;
    // more fields here
    private int score4;

    public SubjectScore(String row) {
        String[] data = row.split(",");
        this.subject = data[0];
        this.score1 = Integer.parseInt(data[1].trim());
        this.score2 = Integer.parseInt(data[2].trim());
        this.score3 = Integer.parseInt(data[3].trim());
        this.score4 = Integer.parseInt(data[4].trim());
    }        
}

Then you can create the desired map as follows:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.AbstractMap;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Objects;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Example {

    public static void main(String[] args)  {
        Map<Tuple, List<SubjectScore>> map = new HashMap<>();
        try (Stream<String> content = Files.lines(Paths.get("path to your csv file"))) {
            map = content.skip(1).map(line -> lineToEntry(line)) //skip header and map each line to a map entry
                    .collect(Collectors.groupingBy(
                            Map.Entry::getKey, 
                            Collectors.mapping(Map.Entry::getValue, Collectors.toList()))
                    );
        } catch (IOException ex) {
            ex.printStackTrace();
        }

        map.forEach((k,v) -> {System.out.println(k + " : " + v);});
    }

    static Entry<Tuple, SubjectScore> lineToEntry(String line) {
        //split each line at the first and second comma producing an array with 3 columns
        // first column with the name and second with year to create a tuple object
        // evrything after the second comma as one column to create a SubjectScore object
        String[] data = line.split(",", 3);
        Tuple t = new Tuple(data[0].trim(), Integer.parseInt(data[1].trim()));
        SubjectScore s = new SubjectScore(data[2]);
        return new AbstractMap.SimpleEntry<>(t, s);
    }
}

I don't know if you really need individual fields for each score in your SubjectScore class. If I were you, I would prefer a list of integers. To do so just change your class to something like :

class SubjectScore {

    private String subject;
    private List<Integer> scores;

    public SubjectScore(String row) {
        String[] data = row.split(",");
        this.subject = data[0];
        this.scores = Arrays.stream(data, 1, data.length)
                .map(item -> Integer.parseInt(item.trim()))
                .collect(Collectors.toList());
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I learned a lot from this template. I was wondering how to take the same approach but convert it into Map<String, Map<Integer, List<SubjectScore>>>. Where the outer Map has key as the Student Name, the inner map has key as year.
1

I was wondering how to take the same approach but convert it into Map<String, Map<Integer, List<SubjectScore>>>.

I have decided to add another answer because your needs regarding the data type have changed. Assuming you have still the same SubjectScore class

class SubjectScore {

    private String subject;
    private List<Integer> scores;

    public SubjectScore(String row) {
        String[] data = row.split(",");
        this.subject = data[0];
        this.scores = Arrays.stream(data, 1, data.length)
                .map(item -> Integer.parseInt(item.trim()))
                .collect(Collectors.toList());
    }
}

The old fashioned way with if-else blocks to check if a key-value pair alreday exists:

public static void main(String[] args) throws IOException {

    List<String> allLines = Files.readAllLines(Paths.get("path to your file"));

    Map<String,Map<String, List<SubjectScore>>> mapOldWay = new HashMap<>();

    for(String line : allLines.subList(1, allLines.size())){
        //split each line in 3 parts, i.e  1st column, 2nd column and everything after 3rd column
        String data[] = line.split("\\s*,\\s*",3);
        if(mapOldWay.containsKey(data[0])){
            if(mapOldWay.get(data[0]).containsKey(data[1])){
                mapOldWay.get(data[0]).get(data[1]).add(new SubjectScore(data[2]));
            }
            else{
                mapOldWay.get(data[0]).put(data[1], new ArrayList<>());
                mapOldWay.get(data[0]).get(data[1]).add(new SubjectScore(data[2]));
            }
        }
        else{
            mapOldWay.put(data[0], new HashMap<>());
            mapOldWay.get(data[0]).put(data[1], new ArrayList<>());
            mapOldWay.get(data[0]).get(data[1]).add(new SubjectScore(data[2]));
        }
    }

    printMap(mapOldWay);
}

public static void printMap(Map<String, Map<String, List<SubjectScore>>> map) {
    map.forEach((outerkey,outervalue) -> {
        System.out.println(outerkey);
        outervalue.forEach((innerkey,innervalue)-> {
            System.out.println("\t" + innerkey + " : " + innervalue);
        });
    });
}

Same logic but shorter using java 8 features (Map#computeIfAbsent):

public static void main(String[] args) throws IOException {

    List<String> allLines = Files.readAllLines(Paths.get("path to your file"));

    Map<String,Map<String, List<SubjectScore>>> mapJ8Features = new HashMap<>();
    for(String line : allLines.subList(1, allLines.size())){
        String data[] = line.split("\\s*,\\s*",3);
        mapJ8Features.computeIfAbsent(data[0], k -> new HashMap<>())
                .computeIfAbsent(data[1], k -> new ArrayList<>())
                .add(new SubjectScore(data[2]));
    }
}

Another approach using streams and nested Collectors#groupingBy

public static void main(String[] args) throws IOException {
    Map<String,Map<String, List<SubjectScore>>> mapStreams = new HashMap<>();        
    try (Stream<String> content = Files.lines(Paths.get("path to your file"))) {
        mapStreams = content.skip(1).map(line -> line.split("\\s*,\\s*",3))
                .collect(Collectors.groupingBy(splited -> splited[0],
                         Collectors.groupingBy(splited -> splited[1], 
                         Collectors.mapping(splited -> new SubjectScore(splited[2]),Collectors.toList()))));
    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

Note: I'm just now realizing that you wanted to represent the year as an Integer. I left it as string. If you want to change it just replace everywhere data[1] or splited[1] with Integer.parseInt(data[1] or splited[1])

1 Comment

Appreciate it. Thanks. Given that student Name and year, is it good practise to generate a map with type Map<String, Map<String, List<SubjectScore>>> OR Map<Tuple, List<SubjectScore>> OR Map<String, SubjectScoreWithYear> where I declare a new class with SubjectScoreWithYear with instance variables of string year and SubjectScore subjectScore? Which of these would be most right in the object-oriented world?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.