1

Consider the case of a list of strings example : list=['apple','bat','cow,'dog','applebat','cowbat','dogbark','help']

The java code must check if any element of string is a subset of another element and if it is then larger string element must be removed.

so in this case strings 'applebat','cowbat','dogbark, are removed.

The approach I have taken was to take two lists and iterate over them in the following way,

ArrayList<String> list1 = new ArrayList<String>(strings);
ArrayList<String> list2 = new ArrayList<String>(strings);
for(int i = 0; i<list1.size();i++)
    {
        String curr1 = list1.get(i);

        for(int j = 0;j<list2.size();j++)
        {
            String curr2 = list2.get(j);

            if(curr2.contains(curr1)&&!curr2.equals(curr1))
            {
                list2.remove(j);
                j--;
        }
        }
    }

IMPORTANT I have lists with the sizes of 200K to 400K elements.I would like to find a way to improve performance. I even tried hashsets but they were not much help.I am facing issues with the time taken by the program.

Can any one suggest any improvements to my code or any other approaches in java to improve performance??

7
  • I think you can not simply remove them while iterating. What you can do is take a negative condition and then add it to another arrayList Commented Jul 12, 2016 at 5:21
  • 1
    Try HashSet and check with contains and to remove use remove methods.. Set will give us O(1) (hypothetical) runtime. Commented Jul 12, 2016 at 5:26
  • i have tried on a small set of strings and the code was fine, but when the list gets very big its taking a lot of time to process, I do not want iterate it n power n times on two lists. I would like to dynamically reduce the two lists so that i can remove a lot of comparisons. Commented Jul 12, 2016 at 5:26
  • check this, programmers.stackexchange.com/questions/280361/… Commented Jul 12, 2016 at 5:27
  • stackoverflow.com/questions/7940337/… Commented Jul 12, 2016 at 5:30

5 Answers 5

2
import java.util.ArrayList;
import java.util.*;
// our main class becomes a file but the main method is still found
public class HelloWorld
{
  public static void main(String[] args)
  {
    String[] strings = {"apple","bat","cow","dog","applebat","cowbat","dogbark","help"};
    ArrayList<String> list1 = new ArrayList<String>(Arrays.asList(strings));
ArrayList<String> list2 = new ArrayList<String>(Arrays.asList(strings));
ArrayList<String> result = new ArrayList<String>(Arrays.asList(strings));
for(int i = 0; i<8;i++)
{

    String curr1 = list1.get(i);
    System.out.println(curr1);
    int flag = 0;
    for(int j = i+1;j<8;j++)
    {
        String curr2 = list2.get(j);

        if((curr2.contains(curr1)&&!curr2.equals(curr1)))
        {

            result.remove(curr2);
        }
    }

}
 System.out.println(result);

  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

And explanation would be nice
I have checked the negative condition i.e if it is equals or if it not contains any substring flag will become 1. after one iteration we are adding it to result. at the end you can delete list1 and list2 objects
0

For full performance boost of huge list of words, I would think a combination of sort and a string searching algorithm, such as the Aho–Corasick algorithm, is what you need, assuming you're willing to implement such complex logic.

First, sort the words by length.

Then build up the Aho–Corasick Dictionary, in word length order. For each word, first check if a substring exists in the dictionary. If it does, skip the word, otherwise add the word to the dictionary.

When done, dump the dictionary, or the parallel-maintained list if dictionary is not easy/possible to dump.

Comments

0

I suppose set will be faster here. You can easy do that with java8 stream api.

Try that:

private Set<String> delete() {
        Set<String> startSet = new HashSet<>(Arrays.asList("a", "b", "c", "d", "ab", "bc", "ce", "fg"));
        Set<String> helperSet = new HashSet<>(startSet);

        helperSet.forEach(s1 -> helperSet.forEach(s2 -> {
            if (s2.contains(s1) && !s1.equals(s2)) {
                startSet.remove(s2);
            }
        }));

        return startSet;
    }

Do not delete any elements from set you are iterating for or you will have ConcurrentModificationException.

Comments

0
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.Random;

public class SubStrRmove {
    public static List<String> randomList(int size) {
        final String BASE = "abcdefghijklmnopqrstuvwxyz";
        Random random = new Random();
        List<String> list = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            int length = random.nextInt(3) + 2;
            StringBuffer sb = new StringBuffer();
            for (int j = 0; j < length; j++) {
                int number = random.nextInt(BASE.length());
                sb.append(BASE.charAt(number));
            }
            list.add(sb.toString());
            sb.delete(0, sb.length());
        }
        return list;
    }

    public static List<String> removeListSubStr(List<String> args) {
        String[] input = args.toArray(new String[args.size()]);
        Arrays.parallelSort(input, (s1, s2) -> s1.length() - s2.length());
        List<String> result = new ArrayList<>(args.size());
        for (int i = 0; i < input.length; i++) {
            String temp = input[i];
            if (!result.stream().filter(s -> temp.indexOf(s) >= 0).findFirst().isPresent()) {
                result.add(input[i]);
            }
        }
        return result;
    }

    public static List<String> removeListSubStr2(List<String> args) {
        String[] input = args.toArray(new String[args.size()]);
        Arrays.parallelSort(input, (s1, s2) -> s1.length() - s2.length());
        List<String> result = new ArrayList<>(args.size());
        for (int i = 0; i < input.length; i++) {
            boolean isDiff = true;
            for (int j = 0; j < result.size(); j++) {
                if (input[i].indexOf(result.get(j)) >= 0) {
                    isDiff = false;
                    break;
                }
            }
            if (isDiff) {
                result.add(input[i]);
            }
        }
        return result;
    }

    public static void main(String[] args) throws InterruptedException {
        List<String> list = randomList(20000);
        Long start1 = new Date().getTime();
        List<String> listLambda = removeListSubStr(list);
        Long end1 = new Date().getTime();
        Long start2 = new Date().getTime();
        List<String> listFor = removeListSubStr2(list);
        Long end2 = new Date().getTime();
        System.out.println("mothod Labbda:" + (end1 - start1) + "ms");
        System.out.println("mothod simple:" + (end2 - start2) + "ms");
        System.out.println("" + listLambda.size() + listLambda);
        System.out.println("" + listFor.size() + listFor);

    }

}

Comments

0

I have tested it on small data and hope it helps you to find solution...

import java.util.ArrayList;
import java.util.Arrays;

public class Main {
    public static void main(String[] args){
        String []list = {"apple","bat","cow","dog","applebat","cowbat","dogbark","help","helpless","cows"};
        System.out.println(Arrays.toString(list));
        int prelenght = 0;
        int prolenght = 0;
        long pretime = System.nanoTime();
        for(int i=0;i<list.length;i++){
            String x = list[i];
            prelenght = list[i].length();
            for(int j=i+1;j<list.length;j++){               
                String y = list[j];
                if(y.equals(x)){
                    list[j] = "0";
                }else if(y.contains(x)||x.contains(y)){
                    prolenght = list[j].length();                   
                    if(prelenght<prolenght){
                        list[j] = "0";
                    }                       
                    if(prelenght>prolenght){
                        list[i] = "0";
                        break;
                    }
                }
            }
        }       
        long protime = System.nanoTime();
        long time = (protime - pretime);
        System.out.println(time + "ns");
        UpdateArray(list);      
    }

    public static void UpdateArray(String[] list){
        ArrayList<String> arrayList = new ArrayList<>();
        for(int i=0;i<list.length;i++){
            if(!list[i].equals("0")){
                arrayList.add(list[i]);
            }
        }
        System.out.println(arrayList.toString());
    }
}

Output :

[apple, bat, cow, dog, applebat, cowbat, dogbark, help, helpless, cows]
time elapsed : 47393ns
[apple, bat, cow, dog, help]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.