6

I have a use case where I scrape some data, and for some records some keys have multiple values. The final output I want is CSV, which I have a library for, and it expects a 2-dimensional array.

So my input structure looks like List<TreeMap<String, List<String>>> (I use TreeMap to ensure stable key order), and my output needs to be String[][].

I wrote a generic transformation which calculates the number of columns for each key based on max number of values among all records, and leaves empty cells for records that have less than max values, but it turned out more complex than expected.

My question is: can it be written in a more concise/effective (but still generic) way? Especially using Java 8 streams/lambdas etc.?

Sample data and my algorithm follows below (not tested beyond sample data yet):

package org.example.import;

import java.util.*;
import java.util.stream.Collectors;

public class Main {

    public static void main(String[] args) {
        List<TreeMap<String, List<String>>> rows = new ArrayList<>();
        TreeMap<String, List<String>> row1 = new TreeMap<>();
        row1.put("Title", Arrays.asList("Product 1"));
        row1.put("Category", Arrays.asList("Wireless", "Sensor"));
        row1.put("Price",Arrays.asList("20"));
        rows.add(row1);
        TreeMap<String, List<String>> row2 = new TreeMap<>();
        row2.put("Title", Arrays.asList("Product 2"));
        row2.put("Category", Arrays.asList("Sensor"));
        row2.put("Price",Arrays.asList("35"));
        rows.add(row2);
        TreeMap<String, List<String>> row3 = new TreeMap<>();
        row3.put("Title", Arrays.asList("Product 3"));
        row3.put("Price",Arrays.asList("15"));
        rows.add(row3);

        System.out.println("Input:");
        System.out.println(rows);
        System.out.println("Output:");
        System.out.println(Arrays.deepToString(multiValueListsToArray(rows)));
    }

    public static String[][] multiValueListsToArray(List<TreeMap<String, List<String>>> rows)
    {
        Map<String, IntSummaryStatistics> colWidths = rows.
                stream().
                flatMap(m -> m.entrySet().stream()).
                collect(Collectors.groupingBy(e -> e.getKey(), Collectors.summarizingInt(e -> e.getValue().size())));
        Long tableWidth = colWidths.values().stream().mapToLong(IntSummaryStatistics::getMax).sum();
        String[][] array = new String[rows.size()][tableWidth.intValue()];
        Iterator<TreeMap<String, List<String>>> rowIt = rows.iterator(); // iterate rows
        int rowIdx = 0;
        while (rowIt.hasNext())
        {
            TreeMap<String, List<String>> row = rowIt.next();
            Iterator<String> colIt = colWidths.keySet().iterator(); // iterate columns
            int cellIdx = 0;
            while (colIt.hasNext())
            {
                String col = colIt.next();
                long colWidth = colWidths.get(col).getMax();
                for (int i = 0; i < colWidth; i++) // iterate cells within column
                    if (row.containsKey(col) && row.get(col).size() > i)
                        array[rowIdx][cellIdx + i] = row.get(col).get(i);
                cellIdx += colWidth;
            }
            rowIdx++;
        }
        return array;
    }

}

Program output:

Input:
[{Category=[Wireless, Sensor], Price=[20], Title=[Product 1]}, {Category=[Sensor], Price=[35], Title=[Product 2]}, {Price=[15], Title=[Product 3]}]
Output:
[[Wireless, Sensor, 20, Product 1], [Sensor, null, 35, Product 2], [null, null, 15, Product 3]]
4
  • 4
    It might be possible to write it in a more concise way although I have the feeling it wouldn't be much shorter or more readable. If your code is correct and doesn't suffer from unnecessary performance issues (I must admint I didn't thoroughly read it) then you might just keep it that way. Commented Dec 7, 2017 at 11:58
  • May I ask ... Why exactly do you want to convert to a String[][]? Commented Dec 7, 2017 at 13:20
  • @MCEmperor because CSV is a tabular format, and the CSV writer accepts Object[][]. Commented Dec 7, 2017 at 13:37
  • One thing I forgot to add is printing the header row, but that shouldn't be hard. Commented Dec 7, 2017 at 13:49

2 Answers 2

7

As a first step, I wouldn’t focus on new Java 8 features, but rather Java 5+ features. Don’t deal with Iterators when you can use for-each. Generally, don’t iterate over a keySet() to perform a map lookup for every key, as you can iterate over the entrySet() not requiring any lookup. Also, don’t ask for an IntSummaryStatistics when you’re only interested in the maximum value. And don’t iterate over the bigger of two data structures, just to recheck that you’re not beyond the smaller one in each iteration.

Map<String, Integer> colWidths = rows.
        stream().
        flatMap(m -> m.entrySet().stream()).
        collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(), Integer::max));
int tableWidth = colWidths.values().stream().mapToInt(Integer::intValue).sum();
String[][] array = new String[rows.size()][tableWidth];

int rowIdx = 0;
for(TreeMap<String, List<String>> row: rows) {
    int cellIdx = 0;
    for(Map.Entry<String,Integer> e: colWidths.entrySet()) {
        String col = e.getKey();
        List<String> cells = row.get(col);
        int index = cellIdx;
        if(cells != null) for(String s: cells) array[rowIdx][index++]=s;
        cellIdx += colWidths.get(col);
    }
    rowIdx++;
}
return array;

We can simplify the loop further by using a map to column positions rather than widths:

Map<String, Integer> colPositions = rows.
        stream().
        flatMap(m -> m.entrySet().stream()).
        collect(Collectors.toMap(e -> e.getKey(),
                                 e -> e.getValue().size(), Integer::max, TreeMap::new));
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
    tableWidth += e.setValue(tableWidth);

String[][] array = new String[rows.size()][tableWidth];

int rowIdx = 0;
for(Map<String, List<String>> row: rows) {
    for(Map.Entry<String,List<String>> e: row.entrySet()) {
        int index = colPositions.get(e.getKey());
        for(String s: e.getValue()) array[rowIdx][index++]=s;
    }
    rowIdx++;
}
return array;

A header array can be prepended with the following change:

Map<String, Integer> colPositions = rows.stream()
    .flatMap(m -> m.entrySet().stream())
    .collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(),
                              Integer::max, TreeMap::new));
String[] header = colPositions.entrySet().stream()
    .flatMap(e -> Collections.nCopies(e.getValue(), e.getKey()).stream())
    .toArray(String[]::new);
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
    tableWidth += e.setValue(tableWidth);

String[][] array = new String[rows.size()+1][tableWidth];
array[0] = header;

int rowIdx = 1;
for(Map<String, List<String>> row: rows) {
    for(Map.Entry<String,List<String>> e: row.entrySet()) {
        int index = colPositions.get(e.getKey());
        for(String s: e.getValue()) array[rowIdx][index++]=s;
    }
    rowIdx++;
}
return array;
Sign up to request clarification or add additional context in comments.

6 Comments

This is great! Much shorter and cleaner. Thank you.
Could you update with a version that prints headers as well? :) For each column.
I suppose, the header should be the map keys? How to deal with the multiple columns associated with a single key? null, repeated keys, or add numbers to the key?
Yes, the map keys. I would just print repeated keys for every column. This is can be viewed as a form of graph flattening where the keys are item properties, so it should be the same property for every value.
The output order is purely determined by the colPositions map, whether the inputs are TreeMaps or not. So you can change the input to List<Map<String, List<String>>> and don’t even need to do something like new TreeMap<>(row) in the loop. To get a guaranteed sorted column order, all you have to do, is to change HashMap::new to TreeMap::new in the above solution, which you must do, even if the input maps are tree maps. With the HashMap, it’s a pure coincidence if the current test data looked sorted in the output. I updated the answer accordingly.
|
1

This is quite concise way to do it using some features.

This solution assumes that only the Category data is dynamic, whereas you will have always only one price and one product name.

Considering you have the initial data

// your initial complex data list 
List<Map<String, List<String>>> initialList = new ArrayList<>();

you can do

// values holder before final conversion
final List<List<String>> tempValues = new ArrayList<>();
initialList.forEach( map -> {
    // discard the keys, we do not need them... so only pack the data and put in a temporary array
    tempValues.add(new ArrayList<String>() {{
        map.forEach((key, value) -> addAll(value));          // foreach (string, list) : Map<String, List<String>>
    }});
});
// get the biggest data list; in our case, the one that contains most categories...
// this is going to be the final data size
final int maxSize = tempValues.stream().max(Comparator.comparingInt(List::size)).get().size();
// now we finally know the data size
final String[][] finalValues = new String[initialList.size()][maxSize];
// now it's time to uniform the bundle data size and shift the elements if necessary

// can't use streams/lambda as I need to keep an iteration counter
for (int i = 0; i < tempValues.size(); i++) {
    final List<String> tempEntry = tempValues.get(i);
    if (tempEntry.size() == maxSize) {
        finalValues[i] = tempEntry.toArray(finalValues[i]);
        continue;
    }
    final String[] s = new String[maxSize];
    // same shifting game as before
    final int delta = maxSize - tempEntry.size();
    for (int j = 0; j < maxSize; j++) {
        if (j < delta) continue;
        s[j] = tempEntry.get(j - delta);
    }
    finalValues[i] = s;
}

and that's it...


You can fill and test the data with this method below (I have added some more categories...)

static void initData(List<Map<String, List<String>>> l) {
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); }});
        put("Price", new ArrayList<String>() {{ add("20"); }});
        put("Title", new ArrayList<String>() {{ add("Product 1"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Sensor"); }});
        put("Price", new ArrayList<String>() {{ add("35"); }});
        put("Title", new ArrayList<String>() {{ add("Product 2"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category14"); }});
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category541"); add("SomeCategory");}});
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
}

I'd still say, the accepted answer looks less computationally expansive, but you wanted to see some Java 8...

7 Comments

FINAL_DATA_BUNDLE_SIZE needs to be set in advance though, not calculated dynamically?
@MartynasJusevičius ughhh sorry, I'll update as soon as i can
@MartynasJusevičius did it
Are you aware, how many classes your code creates? And compared to Arrays.asList(…), using the double curly brace antipattern doesn’t even make the code more concise…
I don’t see, where that is “a very powerful tool”. Your initData creates nineteen classes, all of the ArrayList subclasses storing unintended references to their outer TreeMap instances, while the entire code is bigger and harder to read, compared to the OP’s code provided in the question. There wasn’t even any need to rewrite the code instead of simply copying it. Since I didn’t rewrite that code, there is no “extremely expensive” variant on my side.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.