1

I'm trying to develop a function that reads an ArrayList of string and is capable to find if there exist at least two tuples that have the same values from a set of indices but differ for a supplementary index. I've developed a version of this function by using a RegEx comparison as follow:

    public boolean checkMatching(){
        ArrayList<String> rows = new ArrayList<String>();
        rows.add("7,2,2,1,1");
        rows.add("7,3,2,1,1");
        rows.add("7,8,1,1,1");
        rows.add("8,2,1,3,1");
        rows.add("8,2,1,4,1");
        rows.add("8,4,5,1,1");

        int[] indices = new int[] {2,3};
        int supplementaryIndex = 1;

        String regex = "";
        for(String r : rows){
            String[] rt = r.split(",");
            regex = "[a-zA-Z0-9,-.]*[,][a-zA-Z0-9,-.]*[,][" + rt[indices[0]] + "][,][" + rt[indices[1]] + "][,][a-zA-Z0-9,-.]*";

            for(String r2 : rows){
                if(r.equals(r2) == false){              
                    if(Pattern.matches(regex, r2)){
                        String[] rt2 = r.split(",");
                        if(rt[supplementaryIndex].equals(rt2[supplementaryIndex]) == false){
                            return true;
                        }
                    }
                }
            }

        }   
        return false;
    }

However, it is very expensive, especially if there are many rows. I've thought to create a more complex RegEx that considers multiple choices (with '|' condition), as follow:

    public boolean checkMatching(){
        ArrayList<String> rows = new ArrayList<String>();
        rows.add("7,2,2,1,1");
        rows.add("7,3,2,1,1");
        rows.add("7,8,1,1,1");
        rows.add("8,2,1,3,1");
        rows.add("8,2,1,4,1");
        rows.add("8,4,5,1,1");

        int[] indices = new int[] {2,3};
        int supplementaryIndex = 1;

        String regex = "";
        for(String r : rows){
            String[] rt = r.split(",");
            regex += "[a-zA-Z0-9,-.]*[,][a-zA-Z0-9,-.]*[,][" + rt[indices[0]] + "][,][" + rt[indices[1]] + "][,][a-zA-Z0-9,-.]*"; 
            regex += "|"; //or
        }   

        for(String r2 : rows){
            if(Pattern.matches(regex, r2)){
                //String rt2 = r.split(",");
                //if(rt[supplementaryIndex].equals(rt2[supplementaryIndex]) == false){
                    return true;
                //}
            }
        }

        return false;
    }

But the problem is that this way I can't compare the supplementary index values. Do you have any suggestions on how to define a regex that can directly satisfy this condition? Or, is it possible to leverage java streams to do this efficiently?

2
  • @Holger I would like to check if there exists at least a pair of tuples that violate the main condition. If this is true, the function should returns false. It is important to notice that the second snippet doesn't solve the problem. It is a simulation that shows how I had thought of the code. Commented Apr 19, 2021 at 14:28
  • Ok. I've fixed now. Nothing changes between what you say and the return values of the proposed function. Just handle the output in the calling function correctly. Commented Apr 19, 2021 at 14:37

1 Answer 1

3

The main problem of your first approach is that you have two nested loops over the same list, which gets you a quadratic time complexity. To recall, that implies that the inner loop’s body gets executed 10,000 times for a list with 100 elements and 1,000,000 times for a list of 1,000 elements, and so on.

It doesn’t help calling Pattern.matches(regex, r2) in the inner loop’s body. That method exist only to support (as delegation target) the String operation r2.matches(r2), a convenience method, to do Pattern.compile(regex).matcher(input).matches() in one go. If you have to apply the same regex multiple times, you should keep and re-use the result of Pattern.compile(regex).

But here, there is no point in using a regex at all. You have already decomposed the string using split and can access each component via a plain array access. Using this starting point to compose a regex to be applied on the string again, is complicated and expensive at the same time.

Just use something like

// return true when at least one string has the same values for indices
// but different value for supplementaryIndex

Map<List<String>,String> map = new HashMap<>();

for(String r : rows) {
    String[] rt = r.split(",");
    List<String> key = List.of(rt[indices[0]], rt[indices[1]]);
    String old = map.putIfAbsent(key, rt[supplementaryIndex]);
    if(old != null && !old.equals(rt[supplementaryIndex])) return true;
}
return false;

This loops over the list a single time, extracts the key elements from the array and composes a key for a HashMap. There are various ways to do this. But while it’s tempting to just concatenate these elements like rt[indices[0]] + "," + rt[indices[1]], which would work, using a List is preferable, as it avoids expensive string concatenation.

The code puts the value to check into the map which will return a previous value if this key has been encountered before. If so, the old and new values can be compared and the method can return immediately if they don’t match.

When you are using Java 8, you have to use Arrays.asList(rt[indices[0]], rt[indices[1]]) instead of List.of(rt[indices[0]], rt[indices[1]]).

This can be easily expanded to support variable lengths for indices, by changing

List<String> key = List.of(rt[indices[0]], rt[indices[1]]);

to

List<String> key = Arrays.stream(indices).mapToObj(i -> rt[i]).toList();

or, if you are using a Java version older than 16:

List<String> key
    = Arrays.stream(indices).mapToObj(i -> rt[i]).collect(Collectors.toList());
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.