I have a use case where I scrape some data, and for some records some keys have multiple values. The final output I want is CSV, which I have a library for, and it expects a 2-dimensional array.
So my input structure looks like List<TreeMap<String, List<String>>> (I use TreeMap to ensure stable key order), and my output needs to be String[][].
I wrote a generic transformation which calculates the number of columns for each key based on max number of values among all records, and leaves empty cells for records that have less than max values, but it turned out more complex than expected.
My question is: can it be written in a more concise/effective (but still generic) way? Especially using Java 8 streams/lambdas etc.?
Sample data and my algorithm follows below (not tested beyond sample data yet):
package org.example.import;
import java.util.*;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
List<TreeMap<String, List<String>>> rows = new ArrayList<>();
TreeMap<String, List<String>> row1 = new TreeMap<>();
row1.put("Title", Arrays.asList("Product 1"));
row1.put("Category", Arrays.asList("Wireless", "Sensor"));
row1.put("Price",Arrays.asList("20"));
rows.add(row1);
TreeMap<String, List<String>> row2 = new TreeMap<>();
row2.put("Title", Arrays.asList("Product 2"));
row2.put("Category", Arrays.asList("Sensor"));
row2.put("Price",Arrays.asList("35"));
rows.add(row2);
TreeMap<String, List<String>> row3 = new TreeMap<>();
row3.put("Title", Arrays.asList("Product 3"));
row3.put("Price",Arrays.asList("15"));
rows.add(row3);
System.out.println("Input:");
System.out.println(rows);
System.out.println("Output:");
System.out.println(Arrays.deepToString(multiValueListsToArray(rows)));
}
public static String[][] multiValueListsToArray(List<TreeMap<String, List<String>>> rows)
{
Map<String, IntSummaryStatistics> colWidths = rows.
stream().
flatMap(m -> m.entrySet().stream()).
collect(Collectors.groupingBy(e -> e.getKey(), Collectors.summarizingInt(e -> e.getValue().size())));
Long tableWidth = colWidths.values().stream().mapToLong(IntSummaryStatistics::getMax).sum();
String[][] array = new String[rows.size()][tableWidth.intValue()];
Iterator<TreeMap<String, List<String>>> rowIt = rows.iterator(); // iterate rows
int rowIdx = 0;
while (rowIt.hasNext())
{
TreeMap<String, List<String>> row = rowIt.next();
Iterator<String> colIt = colWidths.keySet().iterator(); // iterate columns
int cellIdx = 0;
while (colIt.hasNext())
{
String col = colIt.next();
long colWidth = colWidths.get(col).getMax();
for (int i = 0; i < colWidth; i++) // iterate cells within column
if (row.containsKey(col) && row.get(col).size() > i)
array[rowIdx][cellIdx + i] = row.get(col).get(i);
cellIdx += colWidth;
}
rowIdx++;
}
return array;
}
}
Program output:
Input:
[{Category=[Wireless, Sensor], Price=[20], Title=[Product 1]}, {Category=[Sensor], Price=[35], Title=[Product 2]}, {Price=[15], Title=[Product 3]}]
Output:
[[Wireless, Sensor, 20, Product 1], [Sensor, null, 35, Product 2], [null, null, 15, Product 3]]
String[][]?Object[][].