Converting List<Map<String, List<String>>> to String[][]

Question

I have a use case where I scrape some data, and for some records some keys have multiple values. The final output I want is CSV, which I have a library for, and it expects a 2-dimensional array.

So my input structure looks like List<TreeMap<String, List<String>>> (I use TreeMap to ensure stable key order), and my output needs to be String[][].

I wrote a generic transformation which calculates the number of columns for each key based on max number of values among all records, and leaves empty cells for records that have less than max values, but it turned out more complex than expected.

My question is: can it be written in a more concise/effective (but still generic) way? Especially using Java 8 streams/lambdas etc.?

Sample data and my algorithm follows below (not tested beyond sample data yet):

package org.example.import;

import java.util.*;
import java.util.stream.Collectors;

public class Main {

    public static void main(String[] args) {
        List<TreeMap<String, List<String>>> rows = new ArrayList<>();
        TreeMap<String, List<String>> row1 = new TreeMap<>();
        row1.put("Title", Arrays.asList("Product 1"));
        row1.put("Category", Arrays.asList("Wireless", "Sensor"));
        row1.put("Price",Arrays.asList("20"));
        rows.add(row1);
        TreeMap<String, List<String>> row2 = new TreeMap<>();
        row2.put("Title", Arrays.asList("Product 2"));
        row2.put("Category", Arrays.asList("Sensor"));
        row2.put("Price",Arrays.asList("35"));
        rows.add(row2);
        TreeMap<String, List<String>> row3 = new TreeMap<>();
        row3.put("Title", Arrays.asList("Product 3"));
        row3.put("Price",Arrays.asList("15"));
        rows.add(row3);

        System.out.println("Input:");
        System.out.println(rows);
        System.out.println("Output:");
        System.out.println(Arrays.deepToString(multiValueListsToArray(rows)));
    }

    public static String[][] multiValueListsToArray(List<TreeMap<String, List<String>>> rows)
    {
        Map<String, IntSummaryStatistics> colWidths = rows.
                stream().
                flatMap(m -> m.entrySet().stream()).
                collect(Collectors.groupingBy(e -> e.getKey(), Collectors.summarizingInt(e -> e.getValue().size())));
        Long tableWidth = colWidths.values().stream().mapToLong(IntSummaryStatistics::getMax).sum();
        String[][] array = new String[rows.size()][tableWidth.intValue()];
        Iterator<TreeMap<String, List<String>>> rowIt = rows.iterator(); // iterate rows
        int rowIdx = 0;
        while (rowIt.hasNext())
        {
            TreeMap<String, List<String>> row = rowIt.next();
            Iterator<String> colIt = colWidths.keySet().iterator(); // iterate columns
            int cellIdx = 0;
            while (colIt.hasNext())
            {
                String col = colIt.next();
                long colWidth = colWidths.get(col).getMax();
                for (int i = 0; i < colWidth; i++) // iterate cells within column
                    if (row.containsKey(col) && row.get(col).size() > i)
                        array[rowIdx][cellIdx + i] = row.get(col).get(i);
                cellIdx += colWidth;
            }
            rowIdx++;
        }
        return array;
    }

}

Program output:

Input:
[{Category=[Wireless, Sensor], Price=[20], Title=[Product 1]}, {Category=[Sensor], Price=[35], Title=[Product 2]}, {Price=[15], Title=[Product 3]}]
Output:
[[Wireless, Sensor, 20, Product 1], [Sensor, null, 35, Product 2], [null, null, 15, Product 3]]

It might be possible to write it in a more concise way although I have the feeling it wouldn't be much shorter or more readable. If your code is correct and doesn't suffer from unnecessary performance issues (I must admint I didn't thoroughly read it) then you might just keep it that way. — Thomas
– Thomas, Commented Dec 7, 2017 at 11:58
May I ask ... Why exactly do you want to convert to a String[][]? — MC Emperor
– MC Emperor, Commented Dec 7, 2017 at 13:20
@MCEmperor because CSV is a tabular format, and the CSV writer accepts Object[][]. — Martynas Jusevičius
– Martynas Jusevičius, Commented Dec 7, 2017 at 13:37
One thing I forgot to add is printing the header row, but that shouldn't be hard. — Martynas Jusevičius
– Martynas Jusevičius, Commented Dec 7, 2017 at 13:49

Holger · Accepted Answer · 2017-12-08 11:09:43Z

7

As a first step, I wouldn’t focus on new Java 8 features, but rather Java 5+ features. Don’t deal with Iterators when you can use for-each. Generally, don’t iterate over a keySet() to perform a map lookup for every key, as you can iterate over the entrySet() not requiring any lookup. Also, don’t ask for an IntSummaryStatistics when you’re only interested in the maximum value. And don’t iterate over the bigger of two data structures, just to recheck that you’re not beyond the smaller one in each iteration.

Map<String, Integer> colWidths = rows.
        stream().
        flatMap(m -> m.entrySet().stream()).
        collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(), Integer::max));
int tableWidth = colWidths.values().stream().mapToInt(Integer::intValue).sum();
String[][] array = new String[rows.size()][tableWidth];

int rowIdx = 0;
for(TreeMap<String, List<String>> row: rows) {
    int cellIdx = 0;
    for(Map.Entry<String,Integer> e: colWidths.entrySet()) {
        String col = e.getKey();
        List<String> cells = row.get(col);
        int index = cellIdx;
        if(cells != null) for(String s: cells) array[rowIdx][index++]=s;
        cellIdx += colWidths.get(col);
    }
    rowIdx++;
}
return array;

We can simplify the loop further by using a map to column positions rather than widths:

Map<String, Integer> colPositions = rows.
        stream().
        flatMap(m -> m.entrySet().stream()).
        collect(Collectors.toMap(e -> e.getKey(),
                                 e -> e.getValue().size(), Integer::max, TreeMap::new));
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
    tableWidth += e.setValue(tableWidth);

String[][] array = new String[rows.size()][tableWidth];

int rowIdx = 0;
for(Map<String, List<String>> row: rows) {
    for(Map.Entry<String,List<String>> e: row.entrySet()) {
        int index = colPositions.get(e.getKey());
        for(String s: e.getValue()) array[rowIdx][index++]=s;
    }
    rowIdx++;
}
return array;

A header array can be prepended with the following change:

Map<String, Integer> colPositions = rows.stream()
    .flatMap(m -> m.entrySet().stream())
    .collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(),
                              Integer::max, TreeMap::new));
String[] header = colPositions.entrySet().stream()
    .flatMap(e -> Collections.nCopies(e.getValue(), e.getKey()).stream())
    .toArray(String[]::new);
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
    tableWidth += e.setValue(tableWidth);

String[][] array = new String[rows.size()+1][tableWidth];
array[0] = header;

int rowIdx = 1;
for(Map<String, List<String>> row: rows) {
    for(Map.Entry<String,List<String>> e: row.entrySet()) {
        int index = colPositions.get(e.getKey());
        for(String s: e.getValue()) array[rowIdx][index++]=s;
    }
    rowIdx++;
}
return array;

edited Dec 8, 2017 at 11:09

answered Dec 7, 2017 at 13:22

Holger

301k43 gold badges481 silver badges827 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Martynas Jusevičius Over a year ago

This is great! Much shorter and cleaner. Thank you.

Martynas Jusevičius Over a year ago

Could you update with a version that prints headers as well? :) For each column.

Holger Over a year ago

I suppose, the header should be the map keys? How to deal with the multiple columns associated with a single key? null, repeated keys, or add numbers to the key?

Martynas Jusevičius Over a year ago

Yes, the map keys. I would just print repeated keys for every column. This is can be viewed as a form of graph flattening where the keys are item properties, so it should be the same property for every value.

Holger Over a year ago

The output order is purely determined by the colPositions map, whether the inputs are TreeMaps or not. So you can change the input to List<Map<String, List<String>>> and don’t even need to do something like new TreeMap<>(row) in the loop. To get a guaranteed sorted column order, all you have to do, is to change HashMap::new to TreeMap::new in the above solution, which you must do, even if the input maps are tree maps. With the HashMap, it’s a pure coincidence if the current test data looked sorted in the output. I updated the answer accordingly.

|

payloc91 · Accepted Answer · 2017-12-07 15:03:55Z

1

This is quite concise way to do it using some java-8 features.

This solution assumes that only the Category data is dynamic, whereas you will have always only one price and one product name.

Considering you have the initial data

// your initial complex data list 
List<Map<String, List<String>>> initialList = new ArrayList<>();

you can do

// values holder before final conversion
final List<List<String>> tempValues = new ArrayList<>();
initialList.forEach( map -> {
    // discard the keys, we do not need them... so only pack the data and put in a temporary array
    tempValues.add(new ArrayList<String>() {{
        map.forEach((key, value) -> addAll(value));          // foreach (string, list) : Map<String, List<String>>
    }});
});
// get the biggest data list; in our case, the one that contains most categories...
// this is going to be the final data size
final int maxSize = tempValues.stream().max(Comparator.comparingInt(List::size)).get().size();
// now we finally know the data size
final String[][] finalValues = new String[initialList.size()][maxSize];
// now it's time to uniform the bundle data size and shift the elements if necessary

// can't use streams/lambda as I need to keep an iteration counter
for (int i = 0; i < tempValues.size(); i++) {
    final List<String> tempEntry = tempValues.get(i);
    if (tempEntry.size() == maxSize) {
        finalValues[i] = tempEntry.toArray(finalValues[i]);
        continue;
    }
    final String[] s = new String[maxSize];
    // same shifting game as before
    final int delta = maxSize - tempEntry.size();
    for (int j = 0; j < maxSize; j++) {
        if (j < delta) continue;
        s[j] = tempEntry.get(j - delta);
    }
    finalValues[i] = s;
}

and that's it...

You can fill and test the data with this method below (I have added some more categories...)

static void initData(List<Map<String, List<String>>> l) {
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); }});
        put("Price", new ArrayList<String>() {{ add("20"); }});
        put("Title", new ArrayList<String>() {{ add("Product 1"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Sensor"); }});
        put("Price", new ArrayList<String>() {{ add("35"); }});
        put("Title", new ArrayList<String>() {{ add("Product 2"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category14"); }});
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category541"); add("SomeCategory");}});
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
}

I'd still say, the accepted answer looks less computationally expansive, but you wanted to see some Java 8...

edited Dec 7, 2017 at 15:03

answered Dec 7, 2017 at 13:55

payloc91

3,8292 gold badges22 silver badges48 bronze badges

7 Comments

Martynas Jusevičius Over a year ago

FINAL_DATA_BUNDLE_SIZE needs to be set in advance though, not calculated dynamically?

payloc91 Over a year ago

@MartynasJusevičius ughhh sorry, I'll update as soon as i can

payloc91 Over a year ago

@MartynasJusevičius did it

Holger Over a year ago

Are you aware, how many classes your code creates? And compared to Arrays.asList(…), using the double curly brace antipattern doesn’t even make the code more concise…

Holger Over a year ago

I don’t see, where that is “a very powerful tool”. Your initData creates nineteen classes, all of the ArrayList subclasses storing unintended references to their outer TreeMap instances, while the entire code is bigger and harder to read, compared to the OP’s code provided in the question. There wasn’t even any need to rewrite the code instead of simply copying it. Since I didn’t rewrite that code, there is no “extremely expensive” variant on my side.

|

Collectives™ on Stack Overflow

Converting List<Map<String, List<String>>> to String[][]

2 Answers 2

6 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related